== PlanetLab hammers on _robots.txt_ The [[Planet Lab http://www.planet-lab.org/]] consortium is, to quote its banner, 'an open platform for developing, deploying, and accessing planetary-scale services'. Courtesy of a friend [[noticing http://blog.centresource.com/2006/05/14/odd-things/]], today's planetary-scale service appears to be repeatedly requesting _robots.txt_ from people's webservers. Here, they've made 523 requests (so far) from 323 different IP addresses (PlanetLab nodes are hosted around the Internet, mostly at universities; they usually have 'planetlab' or 'pl' or the like in their hostnames). The first request arrived at 03:56:11 (Eastern) on May 14th, and they're still rolling in. So far, they haven't requested anything besides _robots.txt_. All of the requests have had the _User-Agent_ string: > _Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc3 Firefox/1.0.6_ This User-Agent string is a complete lie, which is one of the things that angers me about this situation. The *minimum standard* for acceptable web spider behavior is to clearly label yourself; pretending that you are an ordinary browser is an automatic sign of evil. If PlanetLab had a single netblock, it would now be in our port 80 IP filters. Apparently the PlanetLab project responsible for this abuse is called [[umd_sidecar http://www.cs.umd.edu/~capveg/sidecar]], and has already been reported to the PlanetLab administration by people who have had better luck navigating their search interfaces than I have. (It looks like the magic is to ask for advanced search and then specify that you want TCP as the protocol.)