A small update on comment spammer behavior
Back in CommentSpammerBehavior I wrote that checking the HTTP
header wasn't worthwhile because everyone got it right. That is no
longer true; a significant number of comment spam attempts come from
some group that is using HTTP
Referer headers of the (illegal) form
'URL1, URL2, ..., MyURL' (where MyUrl is the URL of my 'write a comment'
form); the number of URLs varies.
(A few times they have left out the spaces after the commas, making
Referer values technically legal.)
Most of the URLs are of other blogs, guestbooks, or bulletin boards that are encrusted with spam, but every so often the spammers will throw in one that isn't, apparently picked at random.
All of the machines in the past 28 days or so use a User-Agent of:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; MyIE2; Maxthon)
Also over the last month, this group of spammers seems to be the only thing using this user-agent string. Some Google searching suggests that places like Project Honeypot are also seeing activity from this group, some of them from IPs that have been doing this for quite a while (see, eg, here, and I have to say the Project Honeypot uses really long URLs).
After some checking, less than 20% of the IP addresses from the last month are listed in xbl.spamhaus.org, although a couple of them are SBL listed; interesting, one of the SBL listed IPs is in IP address space said to belong to the ROKSO-listed 'Hong Chen / YonHen Internet Marketing Center'.
Fortunately, blocking this group is embarrassingly easy. Also fortunately (or unfortunately) they're not very prolific, making maybe 20 attempts a day and hitting only two entries.
(I have a certain peculiar affection for prolific but easily blocked comment spammers; it warms the cockles of my black heart to see them fail over and over again.)