Wandering Thoughts archives

2010-05-29

Some comments on spam scoring and anti-spam tools in general

Here's something important if you're designing or considering a new anti-spam system (as we may be at some point). It may sound obvious, but I think it's not:

If you run an email system, part of your job is filtering spam for your users.

It used to be that you could provide your users a collection of anti-spam options and tools and settings and so on, and consider your work done. Those days are over and gone. Much as with computer security, users have neither the expertise to make sensible decisions about this stuff nor any interest in acquiring it. Dumping a bunch of tools in their laps and running away is more or less the equivalent of doing no spam filtering whatsoever, and is about as unacceptable in practice to most people.

(So why did we get away doing just that for quite a while? I think it's a number of reasons; for a long time it wasn't a big problem, and for a fair while after that no one demonstrated to users that you could actually make the spam problem go away. Nowadays, lots of people have experience with places like GMail that have spam mostly fixed, so they know it can be done. And if GMail can do it, why should they settle for less elsewhere?)

There are some immediate corollaries to this. One of them is what I noted in passing recently: spam scoring is effectively spam filtering. Users will directly take your spam score and filter on it (and then judge you on how well it works), especially if you explicitly mark things that have hit some threshold score (for example, by changing the email's Subject:). The same is true if you provide a standard 'here is how to do filtering' configuration that users can adopt and customize, because most users won't; whatever this configuration is is effectively your spam filtering.

(And if it doesn't work or malfunctions, yes, you will be blamed.)

The usual answer to this is that you won't work out a score (with its possibly charged politics and potential constant demands for tuning), whatever software you're using will just tag messages with various characteristics and leave it to users to decide which ones are bad enough to filter on. This doesn't work, because at best you're back to dumping a bunch of tools in peoples' laps and running away. (At worst they will seize on some obvious bit of the tagging, decide to filter on it, and then blame you when things explode.)

PS: regardless of who is really at 'fault', users feeling that they are getting a terrible, unusable email system is never a good thing. You want to avoid it if at all possible, and remember that the hard problems are the social ones. This applies at multiple levels here.

SpamScoringAndTools written at 00:49:13; Add Comment

2010-05-27

One benefit of relying on third-party (anti-)spam filtering

There are some local developments around the university that have had me thinking about potential alternatives to our current system of spam filtering; right now we rely on some commercial software that the university has a site license for, but it's possible that the license won't be renewed at some point. In the process of this, I've realized something.

One inobvious advantage of outsourced spam filtering is that it means that you are not in control of what gets filtered. That may sound like a disadvantage, but from a social (or political) perspective it is not necessarily so. Put simply, when you have no control, people can't come to you with complaints about how the system works or demands to change it. In turn this means that you're not on the hook to mediate between conflicting demands, where person A demands that some things be recognized as spam yet making the changes cause person B to be unhappy with other things getting classified as spam.

When the spam filtering is a black box provided by the vendor (or any outside party), it's clear to everyone that it's really outside your control; the only choice is 'use the vendor' or 'not use the vendor', and unless the vendor is clearly not providing good results there will be little support for the second option. When you operate and tune spam filtering, well, it's within your power to change how it works and people are going to expect you to do that.

(Technically we only do spam scoring and it's up to individual people to do any filtering desired. In practice we do spam filtering, because most people filter based on whether or not the system scores a message as spammy enough.)

The local environment has lots of fairly polarized opinions about spam filtering or not-filtering. If we had to run our own spam filtering, my pessimistic side suspects that either it would have to be pretty conservative (and thus not too useful) or it would wind up being fairly politicized and troublesome. Neither are very appealing.

(Thus, I really hope that the university keeps the campus site license for our current commercial software.)

OutsideFilteringAdvantage written at 04:01:26; Add Comment

By day for May 2010: 27 29; before May; after May.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.