Some comments on spam scoring and anti-spam tools in general

May 29, 2010

Here's something important if you're designing or considering a new anti-spam system (as we may be at some point). It may sound obvious, but I think it's not:

If you run an email system, part of your job is filtering spam for your users.

It used to be that you could provide your users a collection of anti-spam options and tools and settings and so on, and consider your work done. Those days are over and gone. Much as with computer security, users have neither the expertise to make sensible decisions about this stuff nor any interest in acquiring it. Dumping a bunch of tools in their laps and running away is more or less the equivalent of doing no spam filtering whatsoever, and is about as unacceptable in practice to most people.

(So why did we get away doing just that for quite a while? I think it's a number of reasons; for a long time it wasn't a big problem, and for a fair while after that no one demonstrated to users that you could actually make the spam problem go away. Nowadays, lots of people have experience with places like GMail that have spam mostly fixed, so they know it can be done. And if GMail can do it, why should they settle for less elsewhere?)

There are some immediate corollaries to this. One of them is what I noted in passing recently: spam scoring is effectively spam filtering. Users will directly take your spam score and filter on it (and then judge you on how well it works), especially if you explicitly mark things that have hit some threshold score (for example, by changing the email's Subject:). The same is true if you provide a standard 'here is how to do filtering' configuration that users can adopt and customize, because most users won't; whatever this configuration is is effectively your spam filtering.

(And if it doesn't work or malfunctions, yes, you will be blamed.)

The usual answer to this is that you won't work out a score (with its possibly charged politics and potential constant demands for tuning), whatever software you're using will just tag messages with various characteristics and leave it to users to decide which ones are bad enough to filter on. This doesn't work, because at best you're back to dumping a bunch of tools in peoples' laps and running away. (At worst they will seize on some obvious bit of the tagging, decide to filter on it, and then blame you when things explode.)

PS: regardless of who is really at 'fault', users feeling that they are getting a terrible, unusable email system is never a good thing. You want to avoid it if at all possible, and remember that the hard problems are the social ones. This applies at multiple levels here.

Written on 29 May 2010.
« Why I am really unhappy with ZFS right now: a ZFS import failure
UPSes: defense against problems, or sources of them? »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 29 00:49:13 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.