2006-06-15
How to have your web spider irritate me intensely
It's very simple: put what should be in your User-Agent header into the Referer header instead. The next time I read my Referer logs, you're sure to provoke me into spasms of teeth-grinding irritation. I can only conclude that people pulling this stunt are attempting advertising through other people's public Referer logs.
(For bonus points, fetch my syndication feeds without any attempt at conditional GET.)
Today's offender is the 'Strategic Board Bot', run by strategicboard.com from the IP address 212.143.103.125 (a netvision.net.il IP address, but also where 'www.strategicboard.com' et al points). Since they aren't fetching our robots.txt either, they've earned an immediate listing in our kernel IP filters.
Strategic Board itself has no useful information in its WHOIS record and appears to be in the business of indexing and searching blogs (which makes their non-use of conditional GET all the more serious; anyone specifically pulling syndication feeds should be using it). Of course, they have no 'how to contact us about our robot' information that I can see in what poking at their web page I'm willing to do.
Strategic Board also wins extended bonus points because they didn't used
to do this; they apparently just started yesterday. So they deliberately
decided to 'advertise' by hijacking Referer
and putting a mere 'SB'
into their User-Agent
string. (A couple of early requests had 'HTTP
Remote File Test' as the User-Agent instead.)
Dispelling a nightmare (a sysadmin tale)
One quiet sysadmin nightmare is discovering that you don't know how
to reproduce the running setup on a system that you've taken over.
You know, things like: there's nothing that seems to start important
programs; programs that the configuration seems to say should be running
aren't (or aren't running successfully); programs are running as UIDs
that aren't even in /etc/passwd
.
I spent part of today mostly dispelling a nightmare like that, and therein lies a tale.
I discovered the problem when I was looking into adding more DNS blocklists to the system's QMail configuration. This required finding the configuration, and rapidly led to the discovery that the running SMTP listener didn't seem to be the configured SMTP listener. I did some poking at the time and wound up thinking that it had been started by hand (or by something that had been removed since then, which is almost as much fun).
The last thing you want to do with a working house of cards like this is
touch anything (touching stuff is how you get to have exciting days), so at the time I just carefully saved the output of
'ps augxwww
' and tiptoed away.
(This is often all that you can really do: save as much about the
current system state as possible, in the hopes that you'll have enough
information to reconstruct and restart things if something dies.)
But this Tuesday morning we had a large scale power failure. When I looked at the mail machine to see how broken it was, to my surprise everything was actually running the way it had been before the power failure. Clearly I'd been wrong, and somewhere there was a startup script; I just had to find it.
Brute force to the rescue. The running QMail command line had a relatively distinctive string in it, so:
# find / /var -xdev -type f -print0 | xargs -0 fgrep -l <string>
This turned up the script that was starting the actual daemons,
and some poking around found an invocation of the script tucked
away innocently at the end of /etc/rc.d/rc.local
.
Now I at least have a starting point for rationalizing things a bit, and I can see how it's working. (Well, mostly how it's working. There's still several mysterious bits.)