== Diffbot's bad _Referer_ header Today a web spider called 'Diffbot' (run by diffbot.com) made a whole bunch of requests here, all of which failed. They failed because, just as it has repeatedly done in the past, it made them all with a _Referer_ header of '_!http://news.google.com/_' and this behavior long ago led me to ban it entirely from [[here /~cks/space/]]. There are a number of things wrong with this header. The first is that, to steal from the old Trix commercials, 'silly robot, the _Referer_ header is for humans'. I've writen about this before at [[some length NoRefererForRobots]] and doing it here is generally [[a good way to get your spider banned HowToGetYourSpiderBannedIV]]. (I have a philosophical ramble about why this is the correct view, but it's going in another entry.) The second is that, of course, this _Referer_ value is a flaming lie in two different ways. Diffbot in no way shape or form traveled from news.google.com to the whole collection of URLs here that it attempted to crawl with that _Referer_ header and on top of that, news.google.com does not link to here at all. Diffbot made up the header from whole cloth. I react very badly to web spiders that lie to me at the best of times (even if they aren't spraying junk over my referer logs). Diffbot and its operators may or may not be legitimate, or at least honest about what they're doing; I have no particular opinions on that. But they are unquestionably operating a web spider that routinely lies. I have no idea why and really, I don't care; [[I was doing them a favour by letting them crawl me OnBanningSearchEngines]] and I can and will withdraw that favour if they irritate me. (See also [[my technical requirements for web spiders SpiderTechnicalRequirements]] and [[my standards for responsible spider behavior ResponsibleSpiderBehavior]].) (No, I haven't mailed Diffbot's operators about this behavior. Are you kidding? I'm neither crazy nor stupid. On today's Internet, mailing people about issues is for people that you actually trust.)