How to have your web spider irritate me intensely (part 2)
In the spirit of previous cleverness, here's a simple new technique:
Have your web spider make up random Referer headers.
This wasn't Referer spamming, since the websites in the Referer headers were completely random URLs, apparently drawn from legitimate sites around the Internet (often repeated). Nor were the websites ones that actually linked to us, or had any relationship to the URLs that were being crawled.
Even in low volume this is a sure-fire ticket to our kernel level IP filters, since it insures that we're mostly unable to get anything useful from our Referer logs without a lot of additional work and is therefor deeply irritating.
Today's offender is the IP address 18.104.22.168, which is an unnamed
iol.it IP address; it is using a User-Agent value of 'Mozilla/5.0
(arianna.libero.it,firstname.lastname@example.org)'. It does seem to have
robots.txt, but of course the User-Agent string gives
no clues as to what User-Agent setting in there will turn it off.
Ironically it appears to respect nofollow, unlike
many other better-behaved web spiders.