The risks of spam filtering (part 2)

November 10, 2007

Rejecting supposed spam at SMTP time is somewhat controversial (with the risk being that you will reject legitimate email). What is less appreciated is that not rejecting spam at SMTP time is also dangerous; it just has different and less obvious risks.

The problem is that if you do not reject spam you will wind up filtering spam, and the consequences of mis-characterizing legitimate email as spam are much worse if you filter than if you reject. If you reject outright, the sender of such email knows that it didn't get through and can take steps to deal with the problem; if you filter, the only thing the sender knows is that you're not responding to the email.

You can't avoid the problem by not using software to filter, because that just makes people do it by hand. In fact that makes the problem worse, because people are terrible at doing boring, repetitive tasks like deciding whether a message is spam or non-spam. If the user gets much spam at all, they are going to start chucking non-spam into the spam bucket just by reflex.

(And what matters isn't just the number of spam message they get, it's both the number and the percentage of their email that is spam. People are creatures of habit, and if your habit is 'spam, throw it away', well.)

The usual answer to this is that people will filter supposed spam automatically and then periodically go through their filtered spam by hand to see if anything was mis-classified. This doesn't work, and the problem should be obvious: going through filtered spam is just like filtering your email by hand but worse, because there is far less payoff (real email messages that you actually want). The only time that this is even vaguely practical is when the user isn't getting much spam to start with, so skimming is easy and fast. But to get that, you generally have to be rejecting some degree of spam in order to get the volume that gets through to filtering down to a dull roar.

Note that this holds for any mailbox, which is part of why I am so against having unscreened and commonly known email addresses for appealing spam rejections.

Written on 10 November 2007.
« An object oriented design mistake illustrated
Why vfork() got created (part 1) »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sat Nov 10 22:54:31 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.