What I currently do to stop comment spam on WanderingThoughts

March 13, 2007

WanderingThoughts has been pretty free of successful comment spam attempts for a while, so I think it's about time to write up all of the various things I'm currently doing to stop comment spammers.

(I'm not worried about comment spammers reading this and working past my precautions, because I'm confidant that comment spammers don't bother reading the blogs they spam.)

First off, I get a big leg up by being neither popular nor using common software. This basically reduces the comment spammers down to people automatically filling in any form that moves and people spamming completely by hand. Since I can never stop the latter sort of spammer, I only worry about the former sort.

My current precautions:

  • I refuse comments entirely from web browsers that don't send a User-Agent: header or send a User-Agent header that includes the string 'User-Agent:'. Technically I consider them robots, which are blocked from retrieving a variety of URLs, including the 'Add Comment' pages.

  • the initial Add Comment page doesn't have a 'Post Comment' option in the comment form; you have to preview your comment before it shows up. I think that this is the right behavior to encourage in general, especially since I use nonstandard comment formatting.

    (I got this idea from Sam Ruby, although my implementation is simpler than his.)

  • to prevent spammers from fetching my comment form from one IP address and submitting from another, the comment form stamps itself with the IP address that you previewed from (more or less) and you can only post the comment from an IP address in the same /24.

    (I could have required 'from the same IP address', but I decided that that was too dangerous in the face of proxies and the like.)

  • to deal with spam software that fills in every text field it can find, there's an invisible honeypot field that is supposed to always be blank; if there's any value in the field, the comment won't post. For people with lynx and other browsers that don't deal with CSS, there's text next to the field that tells you to leave it blank.

    (I got this idea from Ned Batchelder.)

  • I refuse comment posts from IP addresses that are in the SBL or the XBL. At the moment I don't bother checking the SBL and the XBL for comment previews, mostly because I want to delay as few things as possible with DNS lookups.

  • comments with control characters are refused; this is part anti-spam precaution and part something required in general. (DWiki doesn't allow control characters in actual pages either.)

Technically I also have a content blacklist, but it is quite out of date and thus pointless. I keep it around mostly to have the hooks in the rest of the code.

DWiki is deliberately written so that it has no general way to write files or otherwise record data locally. This means that I can't take various sorts of precautions that require storing local state, like rate-limiting IP addresses or blocking IP addresses that exhibit characteristic bad behaviors.

(Technically I could write code that assumes that caching is turned on and hijack it for various evil purposes, but I'm not going to go there. Plus, there are concurrency issues that the simple caching layer currently gets to ignore.)

Written on 13 March 2007.
« New warning messages might as well be fatal errors
Machine room archaeology »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Mar 13 00:00:39 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.