An obvious thing about dealing with web spider misbehaviorHere's an important hint for operators of web spiders:
It is therefor very much in the interests of web spider operators
to keep me from ever having to touch my robots.txt file in the
first place, because you are not Google.
You should consider things like (Yes, this means that web spiders should notice issues like this for
themselves. The best use for A corollary to this is that your spider information page should do its best to concisely tell people what they get out of letting you crawl their web pages, because if people have to change their robots.txt to deal with your spider you want them to have as much reason not to block you as possible. (You had better have a spider information page and mention it in your
spider's |
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |