The return of how to get your web spider banned
Today's entrant is 'Uptilt Inc', uptilt.com, aka
64.71.164.96/27. Let's see how they did on my scale:
Important update: it turns out I was fooled by out of date
WHOIS information and Uptilt Inc isn't involved; for the full
story, see UptiltUpdate. Remember that when you read the
historical references to them in the rest of this entry.
- 1,159 requests in one day.
- 25+ requests for several URLs that are permanent redirections.
The redirected to pages haven't changed recently, either.
- They had the generic user-agent string "NutchCVS/0.8-dev
(Nutch; [...])". At least it included a URL to the Nutch
page. (It of course
did not include a URL to any of their own pages.)
- they did frequently fetch robots.txt; 28 times in
one day, in fact.
- None of the 10 different IP addresses in 64.71.164.96/27 that hit
us have reverse DNS. (In fact, nothing in the subnet has reverse
DNS.)
- The subnet has no useful contact information, apart from the
fact that Hurricane Electric says it belonged to an 'Uptilt Inc'.
There is an uptilt.com, but to make you wonder it lives in a different
subnet and its WHOIS data has a different physical address. However,
the uptilt.com website says Uptilt Inc's headquarters is at the same
address as HE has for the owners of 64.71.164.96/27.
In short: even more searching than last time.
- Of course, www.uptilt.com has no information on any spidering
activity they may be doing. Instead, it has lots of information
on them being a "leading provider of Marketing Automation software
solutions", and their subsidiary emaillabs.com being a "leading
provider of advanced email marketing solutions".
- they lose points for having prominent links to a website called
'crm.uptilt.com', which doesn't exist. Some of the links to their
privacy policy and so on don't work either.
- Since around here 'email marketing' tends to be spelled S-P-A-M, I
wasn't exactly encouraged to send them any email about their spider.
These days if you're involved in 'email marketing', I feel that you
had better bend over backwards to reassure people that you're not
a spammer and you understand all the rules and so on.
Overall score: BANNED. Since they use a generic user agent
string (even though it does check robots.txt), their subnet
now resides in our permanent kernel level IP blocks alongside
our first contestant.
(We actually banned them a bit under two weeks ago, but I've only
gotten around to writing this up now. The kernel IP block counters
show that they've tried to drop by a few times since their ban.)