Making things simple for busy webmasters

March 9, 2006

It's always nice when people's software saves me from having to wonder if they're up to no good by handing out obvious signs of it. Take, for example, the spate of people whose web crawling software advertises itself by having the User-Agent string of:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Evidently no one told them not to stutter. (There are a couple of variations in what they claim to be, but that one is the most common. Needless to say, no real User-Agent string (MSIE's included) has an extra 'User-Agent: ' on the front.)

The IP addresses that sourced these are scattered all over; a couple of them are (still) on the XBL, and a couple are in SPEWS.

(And I give bonus points to the person with the User-Agent string "W3C standards are important. Stop fucking obsessing over user-agent already.", which I stumbled over while scanning our logs today. I can certainly agree with the sentiment.)

Another good one is the stealth spider that sends a completely blank Referer: header, instead of omitting it; it stands out like a sore thumb in my log scans. This comes from all over, with 157 different IP addresses over the past 28 days or so, 50 of them currently listed in the XBL.

Written on 09 March 2006.
« Closures versus classes for capturing state
The dynamic linking tax on fork() »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 9 16:42:15 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.