PlanetLab hammers on robots.txt
The Planet Lab consortium is, to
quote its banner, 'an open platform for developing, deploying, and
accessing planetary-scale services'. Courtesy of a friend noticing, today's
planetary-scale service appears to be repeatedly requesting robots.txt
from people's webservers.
Here, they've made 523 requests (so far) from 323 different IP
addresses (PlanetLab nodes are hosted around the Internet, mostly at
universities; they usually have 'planetlab' or 'pl' or the like in their
hostnames). The first request arrived at 03:56:11 (Eastern) on May 14th,
and they're still rolling in. So far, they haven't requested anything
besides robots.txt
.
All of the requests have had the User-Agent
string:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc3 Firefox/1.0.6
This User-Agent string is a complete lie, which is one of the things that angers me about this situation. The minimum standard for acceptable web spider behavior is to clearly label yourself; pretending that you are an ordinary browser is an automatic sign of evil. If PlanetLab had a single netblock, it would now be in our port 80 IP filters.
Apparently the PlanetLab project responsible for this abuse is called umd_sidecar, and has already been reported to the PlanetLab administration by people who have had better luck navigating their search interfaces than I have. (It looks like the magic is to ask for advanced search and then specify that you want TCP as the protocol.)
Comments on this page:
|
|