How CSLab currently does email anti-spam stuff

February 25, 2007

The Computer Science department is strongly against rejecting email just because it might be spam (at least by default); enough people would rather sort through spam than risk rejecting legitimate email. People are willing to have known viruses removed from their email (although not executables in general).

(For clarity: the weekly spam summaries I do are not for CSLab's mail system.)

I once summarized CSLab's general rule is 'thou shalt not reject email just because it smells bad'. We can reject email that has narrow technical failings such as nonexistent origin address domains, and do things that don't cause any problems with legitimate mailers but get spammers to give up. We can't reject on stuff that isn't a clear technical failing, and we can't do anything that causes problems for legitimate mailers.

All external email goes through a frontend machine running Exim 4. This machine does the following spam-related things:

  1. it waits a few seconds before spitting out the initial greeting banner and the response to EHLO/HELO; this is an attempt to persuade spam clients that they are being tarpitted so that they give up. Connections from IP addresses listed in are delayed longer.

    (This is not as good as the real OpenBSD spamd, which trickles out replies one character at a time; Exim just sits on the whole line for N seconds and then blasts it out. I got the general idea from Bob Beck's spamd presentation.)

  2. the MAIL FROM domain has to exist (if it's one of our domains, the full address has to be valid).
  3. the RCPT TO address has to be to us and valid. The frontend machine has a list of valid local usernames (including aliases and mailing lists and so on), so it can immediately reject email to nonexistent local users.
  4. at RCPT TO time, addresses that have opted into it immediately reject email from senders in, and greylist most everyone else (using greylistd, which is a general daemon for doing this). At the moment we have no convenient way for users to opt into this, so it is mostly protecting system aliases.

  5. if the sender is in, we add a message header about it.
  6. the message is run through Sophos PureMessage, which removes known viruses and, if the message has a high enough spam score, adds a note about it to the start of the Subject: header.

After all this the email message is delivered to our central email machine for actual processing and delivery and so on. We don't do anything special with messages tagged as spam; each person gets to decide for themselves how they want to handle such emails, whether that is to filter them on the server with procmail or leave it up to their IMAP client's filtering or do nothing at all.

For an organization that doesn't want to reject email outright, I think that this sort of tagging is a big win; it makes things visible and it makes it easy for all sorts of clients to filter things. You need a reliable spam filter that doesn't need training, though.

We use Sophos PureMessage because the university has a site-wide license for it, so it doesn't cost us anything, and the central campus email system uses it and likes it. In my experience it does a good but not perfect job at recognizing spam, and I've only gotten a few reports of false positives. (And Sophos maintains the spam and virus filtering rules instead of us.)

Things we don't do (that sometimes surprise people):

  • reject HELOs that claim to be from us. This is merely a bad smell, not a narrow technical defect.
  • general greylisting, because there are legitimate mailers that are known to have problems with it.

Exim does reject some badly formed HELOs by default, and we have left that on; I consider that to be a narrow technical defect issue. We also reject email to IP address domain literals, which I believe is another Exim default. We are not currently doing nolisting, but we may in the future; there are defensible technical reasons for having a lower preference MX pointing to our internal central email machine, and its SMTP port isn't reachable from the outside world any more.

Written on 25 February 2007.
« Weekly spam summary on February 24th, 2007
Ordered lists with named fields for Python »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 25 16:28:51 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.