Making things simple for busy webmasters

March 9, 2006

It's always nice when people's software saves me from having to wonder if they're up to no good by handing out obvious signs of it. Take, for example, the spate of people whose web crawling software advertises itself by having the User-Agent string of:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Evidently no one told them not to stutter. (There are a couple of variations in what they claim to be, but that one is the most common. Needless to say, no real User-Agent string (MSIE's included) has an extra 'User-Agent: ' on the front.)

The IP addresses that sourced these are scattered all over; a couple of them are (still) on the XBL, and a couple are in SPEWS.

(And I give bonus points to the person with the User-Agent string "W3C standards are important. Stop fucking obsessing over user-agent already.", which I stumbled over while scanning our logs today. I can certainly agree with the sentiment.)

Another good one is the stealth spider that sends a completely blank Referer: header, instead of omitting it; it stands out like a sore thumb in my log scans. This comes from all over, with 157 different IP addresses over the past 28 days or so, 50 of them currently listed in the XBL.


Comments on this page:

From 209.157.133.146 at 2006-03-12 01:47:39:

I often run wget with the user-agent set to "Mozilla/4.0 (I am an ordinary browser in a large mechanical suit)"

From 150.101.214.82 at 2006-04-17 18:14:32:

The user-agent string you agreed with is part of a movement to send a message to clue-challenged browser-counters:

http://twiki.iwethey.org/Main/UserAgentString

You may want to link to that page if you agree. Heck, you might want to do what it says and change your own user-agent string as described.

By cks at 2006-04-18 01:29:23:

While I sympathize with the goal I feel driven to note some ironies involving the wiki page, given that it is advocating following W3C standards:

The XHML Content-Type issue is apparently a big gotcha in practice. For discussions of it see Mark Pilgrim or Ian Hickson, among others. The wiki page would suffer from several of the issues Ian Hickson writes about if it was actually interpreted as XHTML; for example, its CSS has tag names written in upper case. (Not that that would matter, because it also has the <style> comment problem.)

(See also the W3 XHTML compatibility guidelines.)

Broadly speaking, as far as I can tell anyone trying to really do XHTML (of any variety) right now is a masochist. This is one reason DWiki sticks to HTML 4.01 Transitional.

Written on 09 March 2006.
« Closures versus classes for capturing state
The dynamic linking tax on fork() »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu Mar 9 16:42:15 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.