A really stupid web spider

May 8, 2006

Today WanderingThoughts had a visit from the worst stealth spider that I've ever seen. Given the previous contestants this is a fairly tall order, but I'm confidant I have a winner. The spider:

  • made two requests for directories without the trailing slash, earning it redirections to the proper URLs.
  • followed the redirections, making two valid requests.
  • promptly made 95 bad requests by failing to treat <a href="..."> URLs with a leading slash properly.

I've seen spiders that didn't handle absolute path URLs before, but this is a new and spectacular level of failure. They failed to crawl a single page past their two start pages; all things considered I'm surprised that they even handled the initial redirections properly.

(They're a stealth spider because they claimed to be a variety of harmless Windows based browsers. This is utterly false; first, the browsers would have gotten the requests right, and second very few make 99 requests in 14 seconds from 42 different IP addresses in the same subnet.)

The details

All 99 requests were made in the spam of 14 seconds, from 42 different IP addresses between 66.90.95.207 and 66.90.95.254. WHOIS says that this is part of a /18 owned by fdcservers.net. Unfortunately, fdcservers.net does not have a working whois server and these IP addresses have no reverse DNS; the IPs answer on port 25, but only with a very generic identification.

There's some evidence from Google searches that this is a botnet for some sort of spam, eg here. The 66.90.110.* IP range that this person reports also came by our server, on May 4th. The requests show some traces of a similarly incompetent spider, but they had the luck to hit an area of the site with mostly relative links (and Apache generously fixed up some of their mistakes, like the requests with '/../' in them).

(Nothing from 66.90.95. has hit here before today, at least for the past 28 days of logs that we have, and 66.90.110. only hit us the once on May 4th.)


Comments on this page:

From 128.100.8.31 at 2006-05-08 10:14:19:

All 99 requests were made in the spam of 14 seconds [...]

Finger-macro, Freudian slip, or subtle pun? :-)

--Dan Astoorian

By cks at 2006-05-08 12:40:15:

Finger-macro, Freudian slip, or subtle pun? :-)

I'm going to claim 'all of the above', and try to do it with a straight face.

Written on 08 May 2006.
« Link: Search engine page size limits for indexing
Link: an excerpt from On Writing Well »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 8 02:26:46 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.