Today's question: are anti-spam statistics useful for us?

July 2, 2013

In the postscript of my recent DNS blocklist stats I basically raised a question in passing: are anti-spam stats I can general here actually useful, or they just vaguely interesting? In the jargon, are they actionable information?

When I put it this way, the answer is pretty much no. As I see it, there are two possible reasons anti-spam stats could be actionable here: they could point out some problem in our anti-spam filtering or they could help us allocate limited system resources to anti-spam things with the highest payoff (so we could, for example, eliminate an expensive anti-spam step if it wasn't doing us any good). But neither of these actually apply to us because our anti-spam stuff is basically a black box that we don't tune and the machines involved in this show no signs of being anywhere close to running out of resources.

(Arguably we should monitor our use of DNS blocklists to see if they're doing us any good. But it seems very unlikely that either the CBL or zen.spamhaus.org will stop being effective any time soon and if they do temporarily get quiet, it's not like it does any harm to have them present.)

There are somewhat actionable statistics, but they aren't really accessible. What really matters is the amount of mis-classification that's going on, ie spam that's missed and non-spam that's incorrectly tagged as spam. However we have no way of telling this; only the users can (if they bother to check) and we don't currently have any way to collect information on this.

(We assume that we would hear about it if there was a significant amount of either going on. This may be optimistic, and given that the core of our anti-spam system is a vendor black box there isn't necessarily anything we could do about it anyways.)

I'm a bit sad about this because I find these sorts of statistics to be interesting and so I'd like it if they were also useful. It also means that it doesn't really make sense to spend much time doing things like improving the mail system's logging to help out statistics gathering.

Written on 02 July 2013.
« Our pragmatic approach to updating machines to match our baseline
You can re-connect() UDP sockets (portably) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 2 00:26:52 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.