A small update on comment spammer behavior

June 26, 2007

Back in CommentSpammerBehavior I wrote that checking the HTTP Referer header wasn't worthwhile because everyone got it right. That is no longer true; a significant number of comment spam attempts come from some group that is using HTTP Referer headers of the (illegal) form 'URL1, URL2, ..., MyURL' (where MyUrl is the URL of my 'write a comment' form); the number of URLs varies.

(A few times they have left out the spaces after the commas, making their Referer values technically legal.)

Most of the URLs are of other blogs, guestbooks, or bulletin boards that are encrusted with spam, but every so often the spammers will throw in one that isn't, apparently picked at random.

All of the machines in the past 28 days or so use a User-Agent of:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; MyIE2; Maxthon)

Also over the last month, this group of spammers seems to be the only thing using this user-agent string. Some Google searching suggests that places like Project Honeypot are also seeing activity from this group, some of them from IPs that have been doing this for quite a while (see, eg, here, and I have to say the Project Honeypot uses really long URLs).

After some checking, less than 20% of the IP addresses from the last month are listed in xbl.spamhaus.org, although a couple of them are SBL listed; interesting, one of the SBL listed IPs is in IP address space said to belong to the ROKSO-listed 'Hong Chen / YonHen Internet Marketing Center'.

(The other SBL listings are for and, in SBL52252 and SBL54789 as known open and actively abused proxies.)

Fortunately, blocking this group is embarrassingly easy. Also fortunately (or unfortunately) they're not very prolific, making maybe 20 attempts a day and hitting only two entries.

(I have a certain peculiar affection for prolific but easily blocked comment spammers; it warms the cockles of my black heart to see them fail over and over again.)

Comments on this page:

By nothings at 2007-06-27 04:36:05:

I thought I'd posted this before but I don't see it in the previous entry:

I had a problem with getting spam on the phpbb on my site, but once someone mentioned a way of fixing it, I fixed it perfectly. Since making the change, I have gotten literally no spam.

The trick involves two premises:

  • you are only receiving spam from robots
  • you are willing to use a captcha of some kind

Originally, I had phpbb's captcha turned on, and it wasn't helping. Eventually I realized this was because robots knew how to defeat it somehow, not because the spammers were human.

The second issue may be more significant for you in terms of your willingness to pursue it. You currently have a sort of inverse captcha with hidden form fields or extra submit buttons (or forced preview), but you have to go a little farther in adding a barrier to detect humans. This can be simple, and trivial, and you can probably create a cookie so people only have to do it once even if they're not logged in.

The trick in the phpbb case is that people are attacking phpbb, not my site. All you have to do is write a site-specific captcha that nobody else has, and the script authors never find you, never know about you, and never attack you. (This would not work, of course, if you had a popular site; I do not have any recommendation for the authors of phpbb that can solve their problem; only for individual users of phpbb.)

So, for example, the following captcha would do the trick, both for creating phpbb user accounts, or for posting comments:

  To prove that you are a human, not an automated comment-spamming program,
  type the letter 'x' in the following box:
  :Post Comment:

Any human can trivially do this, and the odds of a bot even attempting to guess the right text is extremely low (at least until this technique becomes well-known, in which case we'll need to use the human element a little more, like saying 'what is 1+1?'). And any user who accepts non-session cookies need only do it once.

But you have to be willing to make humans do this, and you have to burden them with the knowledge that there are comment-spamming robots if you want them to not be confused why they have to do this. (I actually have had people fail to create phpbb accounts on my site because they were confused by my captcha--because I didn't explain it clearly enough.)

By cks at 2007-07-03 16:53:30:

An interesting variant of the simple captcha approach that I've heard of is to have some JavaScript on the page that automatically fills in the captcha form field and then hides it from the user. That way people with JavaScript turned on (which is probably most real visitors) don't even notice it, while spam robots are still impeded since few of them try to run JavaScript.

Written on 26 June 2007.
« ZFS's issues with long term storage management
Why you can't use object.__new__ on everything »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jun 26 23:42:02 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.