2007-11-17
Fighting spam always costs
One of the important things about fighting spam, one that may not be immediately obvious, is that there is no free lunch. Fighting spam can cost in the risk of false positives, or it can cost in people's time (sometimes you can spend money instead), but it always costs; you just decide where you will pay the price.
(There are many ways that it can cost in people's time; for example, putting together and managing a spam filtering system. As illustrated here, you may get to choose which people's time gets to pay the price.)
The corollary of this is that choices and features in fighting spam all have their own cost. For example, to pick one of my hot buttons, leaving your postmaster address unscreened and unblocked so that people can write to you about false positive accidents is not free; you pay in the time that your staff will spend dealing with email to postmaster (and possibly in the resulting staff burnout from dealing with a pile of spam). This cost may still be worth paying, but it is a cost and you should be aware of it.
I think that one of the reasons that this is not immediately obvious is that many of the costs are indirect costs, and indirect costs are often overlooked. It's easy to see the cost of a commercial anti-spam solution and pretty easy to see the costs of false positives (although those costs vary a lot by environment), but things like the costs of making people deal with spam themselves are far harder to quantify and measure.
2007-11-10
The risks of spam filtering (part 2)
Rejecting supposed spam at SMTP time is somewhat controversial (with the risk being that you will reject legitimate email). What is less appreciated is that not rejecting spam at SMTP time is also dangerous; it just has different and less obvious risks.
The problem is that if you do not reject spam you will wind up filtering spam, and the consequences of mis-characterizing legitimate email as spam are much worse if you filter than if you reject. If you reject outright, the sender of such email knows that it didn't get through and can take steps to deal with the problem; if you filter, the only thing the sender knows is that you're not responding to the email.
You can't avoid the problem by not using software to filter, because that just makes people do it by hand. In fact that makes the problem worse, because people are terrible at doing boring, repetitive tasks like deciding whether a message is spam or non-spam. If the user gets much spam at all, they are going to start chucking non-spam into the spam bucket just by reflex.
(And what matters isn't just the number of spam message they get, it's both the number and the percentage of their email that is spam. People are creatures of habit, and if your habit is 'spam, throw it away', well.)
The usual answer to this is that people will filter supposed spam automatically and then periodically go through their filtered spam by hand to see if anything was mis-classified. This doesn't work, and the problem should be obvious: going through filtered spam is just like filtering your email by hand but worse, because there is far less payoff (real email messages that you actually want). The only time that this is even vaguely practical is when the user isn't getting much spam to start with, so skimming is easy and fast. But to get that, you generally have to be rejecting some degree of spam in order to get the volume that gets through to filtering down to a dull roar.
Note that this holds for any mailbox, which is part of why I am so against having unscreened and commonly known email addresses for appealing spam rejections.