== An obvious thing about dealing with web spider misbehavior Here's an important hint for operators of web spiders: > If I have to touch robots.txt or my web server configuration to deal > with your spider's behavior, the easiest and safest change is to block > you entirely. It is therefor very much in the interests of web spider operators to keep me from ever having to touch my robots.txt file in the first place, because [[you are not Google OnBanningSearchEngines]]. You should consider things like _crawl-delay_ to be desperate last ditch workarounds, not things that you routinely recommend to people you are crawling. (Yes, this means that [[web spiders should notice issues like this for themselves ResponsibleSpiderBehavior]]. The best use for _crawl-delay_ that I've seen suggested is as a way to tell spiders to speed up, not slow down, their usual crawling rate.) A corollary to this is that your spider information page should do its best to concisely tell people what they get out of letting you crawl their web pages, because if people have to change their robots.txt to deal with your spider you want them to have as much reason not to block you as possible. (You had better have a spider information page and mention it in your spider's _User-agent_. Ideally it will also rank high in searches for your spider's official name and for its _User-agent_ string.)