== A really stupid web spider Today WanderingThoughts had a visit from the worst stealth spider that I've ever seen. Given the [[previous StupidSpiderTricks]] [[contestants StupidSpiderMistakes]] this is a fairly tall order, but I'm confidant I have a winner. The spider: * made two requests for directories without the trailing slash, earning it redirections to the proper URLs. * followed the redirections, making two valid requests. * promptly made ~~95 bad requests~~ by failing to treat URLs with a leading slash properly. I've seen spiders that didn't handle absolute path URLs [[before StupidSpiderMistakes]], but this is a new and spectacular level of failure. *They failed to crawl a single page past their two start pages*; all things considered I'm surprised that they even handled the initial redirections properly. (They're a stealth spider because they claimed to be a variety of harmless Windows based browsers. This is utterly false; first, the browsers would have gotten the requests right, and second very few make 99 requests in 14 seconds from 42 different IP addresses in the same subnet.) === The details All 99 requests were made in the spam of 14 seconds, from 42 different IP addresses between 66.90.95.207 and 66.90.95.254. WHOIS says that this is part of a /18 owned by fdcservers.net. Unfortunately, fdcservers.net does not have a working whois server and these IP addresses have no reverse DNS; the IPs answer on port 25, but only with a very generic identification. There's some evidence from Google searches that this is a botnet for some sort of spam, eg [[here http://www.dl6kac.de/blog/2006/spambots-with-changing-ips/]]. The 66.90.110.* IP range that this person reports also came by our server, on May 4th. The requests show some traces of a similarly incompetent spider, but they had the luck to hit an area of the site with mostly relative links (and Apache generously fixed up some of their mistakes, like the requests with '_/../_' in them). (Nothing from 66.90.95. has hit here before today, at least for the past 28 days of logs that we have, and 66.90.110. only hit us the once on May 4th.)