2006-02-18
Automation promotes action
I've written before about the advantages of scripts, but my own recent experiences have reminded me of another one, which I can snappily phrase as automation promotes action. Or in other words: when you've scripted something so that it's easy to do, you do it more often.
I build Firefox from CVS myself, for various reasons, but I run it on a different machine than I build on. Bringing up a new build up on the right machine involves about three or four steps; nothing particularly intricate, but just long enough to be vaguely annoying. (Among other things, I save the old build in case I want or need to revert to it; since I'm on the bleeding edge, this happens every so often.)
For a long time I didn't script this, because it was just small enough
to seem silly and just dangerous enough to make me nervous. Recently
I bit the bullet and spent the time to write a newfox script that I
was happy with. Predictably, I now feel much happier about doing build
updates and they happen somewhat more often; the difference between
typing newfox and typing N somewhat long commands turns out to
actually matter after all.
Call it a lesson learned. Apparently every so often I need to experience these things, not just know them intellectually.
Stupid web spider tricks
In the spirit of earlier entrants, but not as bad, here's some stupid web spider tricks.
The first stupid trick: crawling 'Add Comment' pages. Not only are the
'Add Comment' links marked nofollow (so good little spiders shouldn't
be going there), but it's also a great way to make me wonder if you're
a would-be comment spammer and pay close attention to every CSpace page
you hit. CSpace gets sufficiently few pages views at the moment that I
can read all of the server logs, so I will notice.
(All sorts of web spiders seem to find the 'Add Comment' links
especially tasty for some reason; it's quite striking. I'm pretty sure
they're the most common nofollow links for web spiders to crawl.)
The second stupid trick: including a URL explaining your spider, but having that URL be a '403 permission denied' error page. Fortunately for my irritation level, I could find a copy in Google's cache (pick the cached version of the obvious web page) and it more or less explained the web spider was doing.
Thus, today's entrant is the 'findlinks' web spider, from various 139.18.2.* and 139.18.13.* IP addresses (which belong to uni-leipzig.de) plus a few hits from 80.237.144.96 (which doesn't seem to). The spider seems to be a distributed one, where any client machine that uses the software can crawl you. (I'm not sure I like distributed crawlers.)
On a side note, I derive a certain amount of amusement from seeing English Apache error messages on a foreign language website.
(Other information on the findlinks spider: in this huge database of spiders or here.)