Wandering Thoughts archives

2006-02-18

Automation promotes action

I've written before about the advantages of scripts, but my own recent experiences have reminded me of another one, which I can snappily phrase as automation promotes action. Or in other words: when you've scripted something so that it's easy to do, you do it more often.

I build Firefox from CVS myself, for various reasons, but I run it on a different machine than I build on. Bringing up a new build up on the right machine involves about three or four steps; nothing particularly intricate, but just long enough to be vaguely annoying. (Among other things, I save the old build in case I want or need to revert to it; since I'm on the bleeding edge, this happens every so often.)

For a long time I didn't script this, because it was just small enough to seem silly and just dangerous enough to make me nervous. Recently I bit the bullet and spent the time to write a newfox script that I was happy with. Predictably, I now feel much happier about doing build updates and they happen somewhat more often; the difference between typing newfox and typing N somewhat long commands turns out to actually matter after all.

Call it a lesson learned. Apparently every so often I need to experience these things, not just know them intellectually.

sysadmin/AutomationPromotesAction written at 18:02:28; Add Comment

Stupid web spider tricks

In the spirit of earlier entrants, but not as bad, here's some stupid web spider tricks.

The first stupid trick: crawling 'Add Comment' pages. Not only are the 'Add Comment' links marked nofollow (so good little spiders shouldn't be going there), but it's also a great way to make me wonder if you're a would-be comment spammer and pay close attention to every CSpace page you hit. CSpace gets sufficiently few pages views at the moment that I can read all of the server logs, so I will notice.

(All sorts of web spiders seem to find the 'Add Comment' links especially tasty for some reason; it's quite striking. I'm pretty sure they're the most common nofollow links for web spiders to crawl.)

The second stupid trick: including a URL explaining your spider, but having that URL be a '403 permission denied' error page. Fortunately for my irritation level, I could find a copy in Google's cache (pick the cached version of the obvious web page) and it more or less explained the web spider was doing.

Thus, today's entrant is the 'findlinks' web spider, from various 139.18.2.* and 139.18.13.* IP addresses (which belong to uni-leipzig.de) plus a few hits from 80.237.144.96 (which doesn't seem to). The spider seems to be a distributed one, where any client machine that uses the software can crawl you. (I'm not sure I like distributed crawlers.)

On a side note, I derive a certain amount of amusement from seeing English Apache error messages on a foreign language website.

(Other information on the findlinks spider: in this huge database of spiders or here.)

web/StupidSpiderTricks written at 02:35:29; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.