суббота, 13 декабря 2008 г.

Blogs and forums spam bots protection

Preamble

Forum and blog spam is an absolute pest for a last years. I see a lot of their’s holders complaints, and I also see a lot of polluted content. Spammers do their dirty job by the two ways: automaticaly by the bots and manualy writing to the forums. The basic threat is made by an automated spam, and I’ll explain protection from it there.

There is a lot of standart techiques to protect from the spam. We all know them: captha, email confirmation, user interative and so on. All of them are less or more require odd user attention. And because they are visible for a spammers, they are not invulnerable. Captcha is breakable with a cheap porno traffic, email confirmation is easily automated with a scripts and free mail services, user interactive sometimes confuse users.

But there is another good method, not disturbing users, and not so obvious to bypas - behavior tracking. This is a good way to protect yor blog or forum from spam, because this protection is hidden, and spammer don’t know where exactly you are checking him.

Bot’s behavior

The main goal of spammers is to spam as much as possible. As the result - the lack of the real user behavior. Spammers are very prudent to the traffic, and do not perform a many usual steps. For example, the user browser after loading the page always loads at least a CSS file from the site, because it is required to display the page properly. But the CSS is not required to be received by the spammers. The same is with the images from the page - they also are not downloaded by the bot (but sometimes they are not downloaded by a real visitors, for example if user is on GPRS and tryes to reduce the traffic).

Most smammers recognize the site only once, and perform a spam after a some time (usualy next day). The “one recognition” is also a speedup from their side. This means that they will do not download the page containing a form every time before to do the message post. I also think that they are using a two types of software: one for a crowling across the web in search for a forum or a blog, and another for the messages posting. This can explain why do they mostly post the spam on the next day after their spider reaches a victim site.

Antispam protection

There is a several simple recommendations from me, how to track a spammer:

  1. Do the check for a previously downloaded CSS or image by the poster’s IP address. The best is to check for the image receiving: site page always contain several images, and it is not easy to understand for a spammer which of them is checked by you (or maybe all of them are checked). You can display the pictures manualy by the scripts. With the thing like mod rewrite this will not be obvious for a spammer. Or you may check the access.log file of your site. If you’re hosted on the unix compatible machine, this can be easily and effectively done with the bundled tail and grep utilitys together.
  2. Add the additional hidden parameter to your html form. This parameter should be unique for each minute, and it should be easy to understand for you (but not for a spammer) to which minute it does belong. Then you can drop all posts which are submitted with a ancient parameter value.
  3. The spammers do not implement JavaScript enjine into their tools. It is possible to mask by composing a message html form with a JavaScript. But be careful, generate you form with the several steps. Do not simply write it by one fucntion call. If spammer is using a regular expressions instead of html parser, it is possible that he will catch a footprint of a html form, and your trick will not help.
  4. Always check a Referer field in HTTP headers. Some spammers do not pass a valid Referer. Typical user browser always submit message with a valid Referer in HTTP headers to the page containing html form.
  5. Check a HTTP UserAgent. There is a little amount of dummy (but very annoying) spammers, wich do not supply a relevant UserAgent. And there is another group of too much “intelligent” spammers which use a browsers UserAgents, but they are changing them on each access. I suppose they always pick them randomly from an array in a script. So, if the html form was transmitted from your site to the one UserAgent, but submitted with another, also discard a message.

Conclusion

All this tricks were used by me on a several projects, and showed a good result. For sure, if the spammer is interested in spamming exactly your resource, and he is creating a specialized software, and having a good experience, he can finally pass throug all your barricades. But he well need a several hours of bashing his head over the keyboard, tearing a hairs from all places over his body, and shouting “how? why? where?”, and maybe he will leave you away.

If you found this article useful. If you’re using your own tips and tricks to protect a resources. Feel free to submit a commet. And do not mind to share a link to this article with your friends - let’s resist together!




Гарнитура BENIN

Черепица

Способы медитации

Искусственный камень теперь круче чем дерево

Уральские авиалинии продают билеты на зиму

Комментариев нет: