What I wrote about the Computer Science department's spam filtering
back in 2007 is still broadly correct, but as you
might expect the passage of several years of time has changed some of
the details and added some things. I'm not going to repeat stuff from
the original here, just supplement it with some
additional notes that are current as of May 2012.
Most of the server side anti-spam stuff we do happens on our external
MX gateway. These days it does a number
of anti-spam related things:
- We still wait a bit before giving initial greetings and responses
HELO. I'm not convinced that this does any good these
days (if it ever did), but the code is there so it's staying.
Inertia is a powerful force sometimes.
- As everyone should , we insists on valid addresses
MAIL FROM and
RCPT TO to the extent that we can verify
them simply (we don't do any sort of callback verification, partly
because it's evil and very hard to do right). We can fully verify
our own addresses (for both senders
and recipients) and we verify that outside domains actually exist.
RCPT TO time, addresses that have opted into server side spam
handling immediately reject email from IP addresses in
zen.spamhaus.org and apply
greylisting if they have enabled it. These days we have a self-serve
system where users can set email addresses under their control
to either moderate or strong spam filtering, with greylisting as
an option for either.
(The self-serve system isn't well publicized but a certain number
of people have taken advantage of it.)
DATA time, and provided that all of the destination addresses have
opted in to server side spam filtering, we call out to a milter
interface on our Sophos PureMessage
install in order to get a spam and virus indication for the
message. If it scores enough, we immediately reject it. Otherwise we
accept it and continue processing.
- If the sender is in zen.spamhaus.org we add a message header about it.
This is at least theoretically useful for people's filtering and
also gets used later in our processing for some things.
- The message is run through Sophos PureMessage again using a non-milter
interface that allows message modification. This trip actually
strips known viruses and, if the message has a high enough spam
score, adds a note about it to the start of the
Note that this means that some number of messages actually get run
through Sophos PureMessage twice, once at
DATA time to perform the
milter check (the results of which are effectively thrown away) and
then a second time to do the real filtering.
(At the mechanical level this step uses SMTP,
which is why it can modify the message when our hacked-together Exim
milter setup can't. Our Sophos configuration does the same thing for
the SMTP filtering and the milter interface; the only difference is
the communication process.)
- If the message was tagged as spam (or a virus) and is to someone who has
opted for strong spam filtering, it's discarded. Well, technically
they're dropped from the recipient listing; unlike SMTP
time filtering, this can be done selectively for only some recipients.
After this the message is delivered to our central email
machine for actual
processing and delivery and so on. If it was tagged as spam,
this may lead to it getting automatically discarded by the mail
system due to things like our special
.forward system to
easily discard spam or a similar
system for automatically diverting spam to local mailing lists. Otherwise, what happens to
spam-tagged messages is still up to our users; each person gets to
decide for themselves how they want to handle such emails and people
have adopted a wide variety of measures.
We deliberately expose only very generic and high-level server side spam
filtering options to our users; for each of their addresses they can opt
for 'moderate' or 'strong' filtering, with or without greylisting. Being
generic means that we preserve our freedom to evolve just what each
level of filtering does over time in a way that we wouldn't have if
users had, for example, specifically opted in to or out of 'reject email
if the sending IP is in zen.spamhaus.org'. We make only relatively
generic promises about what each level does; the most important one is
that moderate spam filtering always rejects at SMTP time so that if it
misfires the sender knows about it.
(We do document what each level of filtering currently does, but we also
specifically document that this can change and if you opt in to one of
them you understand that the details may change over time.)
At a mailer level things are much more broken down, which means that we
can hand-manipulate what filtering happens for specific addresses at a
relatively specific level of detail. We use this power to apply special
filtering (for example, strong milter filtering only) to some addresses.
It's possible that we should expose some of this power to users, but
doing so would present users with more and more choice complexity and
also add constraints on our ability to evolve things in the future.
(At the same time we do want to offer users options that match the
choices they want to make. One big question is what those choices are;
we don't really know how our users think about spam filtering and so