A really stupid web spider
Today WanderingThoughts had a visit from the worst stealth spider that I've ever seen. Given the previous contestants this is a fairly tall order, but I'm confidant I have a winner. The spider:
- made two requests for directories without the trailing slash, earning it redirections to the proper URLs.
- followed the redirections, making two valid requests.
- promptly made 95 bad requests by failing to treat <a href="..."> URLs with a leading slash properly.
I've seen spiders that didn't handle absolute path URLs before, but this is a new and spectacular level of failure. They failed to crawl a single page past their two start pages; all things considered I'm surprised that they even handled the initial redirections properly.
(They're a stealth spider because they claimed to be a variety of harmless Windows based browsers. This is utterly false; first, the browsers would have gotten the requests right, and second very few make 99 requests in 14 seconds from 42 different IP addresses in the same subnet.)
The details
All 99 requests were made in the spam of 14 seconds, from 42 different IP addresses between 66.90.95.207 and 66.90.95.254. WHOIS says that this is part of a /18 owned by fdcservers.net. Unfortunately, fdcservers.net does not have a working whois server and these IP addresses have no reverse DNS; the IPs answer on port 25, but only with a very generic identification.
There's some evidence from Google searches that
this is a botnet for some sort of spam, eg here. The
66.90.110.* IP range that this person reports also came by our server,
on May 4th. The requests show some traces of a similarly incompetent
spider, but they had the luck to hit an area of the site with mostly
relative links (and Apache generously fixed up some of their mistakes,
like the requests with '/../
' in them).
(Nothing from 66.90.95. has hit here before today, at least for the past 28 days of logs that we have, and 66.90.110. only hit us the once on May 4th.)
Comments on this page:
|
|