== My expectations for responsible spider behavior My [[minimum technical requirements SpiderTechnicalRequirements]] for real web spiders are deliberately quite black and white. But there are also a number of more fuzzy things that I expect from responsible web spiders. Bear in mind that these aren't hard and fast rules and I can't give precise numbers and so on. (As before, this only applies to what I'm calling 'real' or [['legitimate' web spiders WebSpiderMeaning]]; I can't expect any particular behavior from malicious web spiders.) Disclaimers in place, here's what I expect of responsible web spiders: * check _robots.txt_ frequently and adjust your behavior rapidly, say within no more than two days. (I do not care what infrastructure you require to do this; the fact that _robots.txt_ updates have to propagate around six layers of your internal topology before reaching the crawler logic are your problem, not mine.) * don't make requests more frequently than one every few seconds or so. * more importantly, ~~notice when the website is slowing down and slow down yourself~~. If the website's response speed is down, this is a very big clue that your spider should space out requests more. * don't rapidly re-crawl things that haven't changed. It's reasonable to check a few times just to make sure that what looks like unchanging content really is, but after that spiders should slow down. If you spend months revisiting a page three times a week when it hasn't changed in years, I get peeved. * ~~URLs that get errors count as unchanged pages~~. Crawl them a few times to make sure that they stay errors, but after that you should immediately demote them to the bottom of your crawl rates. * this goes *triple* if the error you are getting is a 403 error, because you are being told explicitly that this is content you are not allowed to see. Disclaimer: as before, I reserve the right to amend this list as more things occur to me, probably as the result of seeing yet more badly behaved web spiders. (Suggestions of more are welcome; I'm probably missing some obvious ones.)