If spam false positives are inevitable, should we handle them better?

September 5, 2017

Right now, our mail system basically punts on handling false positives (non-spam detected as spam). The only thing that users can do about false positives is turn off SMTP time rejection (if they have it turned on) and then fish the mis-classified message out of their filters or our archive of all recent email they've gotten. If the message has already been rejected, the only thing that can be done is to get the sender to re-send it. And there's no way for users to see what messages have been rejected, so they can tell if some important email message has fallen victim to a false positive; instead we periodically get requests to check our logs.

My impression is that our mail system's behavior is not atypically bad, and instead that plenty of other mail systems behave in much the same way. It's pretty straightforward to see why, too; it would take significantly more work to engineer anything more than this, especially if you reject at SMTP time (and I think you want to, because that way at least people find out that their email hasn't gone through because of a false positive). But probably we should do better here, if only because this is a pain point for our users (it's one of the things that gets them to talk to us about our spam filtering).

(This is also probably required if we accept the idea that 'spam levels' may be a copout.)

In general, a mail system do things with potential false positives from two sides. It can give the local receiving user some tools to investigate situations, answering the question of 'did X try to send me email and what happened to it', and perhaps also to retrieve such mis-classified email. Retrieving email rejected at SMTP time requires your mailer to save a copy of such email (at least for a while), which means you need to defer rejections to DATA time. This opens up a complicated tangle of worms for messages sent to multiple recipients (although they go away again if you mandate that you only have one 'spam level' and everyone gets it).

Your mailer can also give senders some tools they can use to cause false positive messages to get accepted anyway. You probably don't want to offer these tools to all senders; sure, most spammers aren't paying attention to you, but some spam (such as advance fee fraud attempts) does come from real human beings doing things to compromised mail systems by hand, and they might well take advantage of your generosity. However, if someone's regular correspondent has some email classified as spam, it's probably safe (and worthwhile) to offer them these tools. The odds are probably good that it's an accident as opposed to a compromised account with a human being at the other end to take advantage of you.

(There are a wide variety of options for how to let people psuh messages through. You could do 'mail it to this special one-time address', or 'include this magic header or Subject marker', or just 'visit this URL of ours to request that your email get delivered anyway'. And I'm sure there's more. I have no idea which option would work best, and SMTP-time rejection makes things complicated because it's hard to give people much information.)

None of these are particularly easy to put together with off the shelf components, though, which for many places is probably going to make the entire issue moot. And maybe things should be left as they are for the straightforward reason that a low level of false positives just doesn't justify much sysadmin effort to improve the situation, especially if it requires a complicated bespoke custom solution.

(This is one of the entries where it turns out that I don't have any firm conclusions, just some rambling thoughts I want to write down.)

Comments on this page:

By Arnaud Gomes at 2017-09-06 07:49:11:

Back when I was working in a research lab, we did write a (very simple) reporting tool for our virus/malware filter; basically we just emailed the user whenever we caught incoming malware. As far as I remember it was never used in the 6 or 7 years I saw (ie nobody ever asked us to recover a quarantined file).

If I had to design a similar system for spam filtering nowadays, I would probably just build a small web app to expose the rejection logs and see what feedback I could get from the users. Depending on the number of false positives you get, it would probably make sense to reject spam at DATA time and keep a copy of what you reject, in case the recipient wants to look at it anyway.

Not perfect, easier than having to manually dig through your logs. Better? Maybe, maybe not, I would guess this is heavily dependent on the kind and number of spam and false positives you get.

Written on 05 September 2017.
« The idea of 'spam levels' may be a copout
Systemd on Ubuntu 16.04 can't (or won't) reliably reboot your server »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Sep 5 00:17:44 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.