2010年8月13日星期五

How we Stop Spam

There is a scenario that is all too familiar to us in the tech office...
An account manager comes storming in, hands in hair, shouting and screaming because one of nfl jersey their clients were hit by a spambot. Ok, I might be exaggerating a little bit...well maybe a lot. But just last week it happened again.

One of our clients received spam in their inbox - submitted via their website form. Luckily, I could calmly respond that we can add our spam protection to the website forms. An hour later, and the job was done, or so I thought.
Along with the usual thank you message from the account manager, was a query from the client who wanted to know how we actually do it. Normally clients are just happy to have the spam stop, but this inquisitive client is the reason that I'm writing this post and explaining how we go about stopping spam messages.

Without going into too much technical detail about how our spam prevention works, the steps we follow include the following:

1. A Honeypot Trap
According to Wikipedia a honeypot is: "... a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems." In plain English it means we set a trap for the spambot and hope they step in it.
We add an invisible (to the visitor, not the spambot) field to the form and with server side validation check whether the field was filled in. Of football jerseys all the prevention steps we take this is the most effective, as spambots are normally not intelligent enough to pick up that the field is invisible and should not be filled in, and I would estimate based on our logs that it stops around 80-90% of all attacks.

2. Comparing a Randomly Generated Field Value
This involves generating a random value and saving it in the session. This value is also set on the form we are protecting and when the form is submitted we compare the value from the form and the session to see if they match.

3. Timestamp Comparison
The idea behind this step is that a human would take at least 3 seconds to submit even the simplest of forms. We record the current time when the form is loaded and compare it to the time when the form is submitted.

4. Checking the Referrer Header
The referrer header is the URL of the page from which the request was made. If this URL is not from the same domain as the website, it could indicate that the request was made from a spambot.

5. Content Checking
After reading a few spam messages, it is easy to determine their modus operandi. It is normally either full of links and/or full of all those unwanted 4 letter words. If the spambot managed to get past the other prevention checks, it will normally fail at this step.
We check content based on three criteria:
• Links - Messages that contain many links are a common trademark of spambots.
• Blocked Words - Words that are not allowed to appear in the message at all. These would normally include those unwanted 4 letter words spambots love to use.
• Bad Words - Words that are only allowed to appear a set number of times in the message. This would include words like 'free' that would be used in a normal human submitted message, yet in a spam message would normally occur a few times.
Not each website is targeted with the same messages or manner, and therefore it is sometimes necessary for us to adjust the parameters and strictness of nfl jerseys the prevention checks for each website.

"Just install a CAPTCHA" I hear you say. It could work, but here at Quirk we don't really like them.
Apart from people finding it difficult to read the letters and being discouraged to actually complete the form, especially a comment form on a blog post, why should they have to prove they are human?

没有评论:

发表评论