How we do milter-based spam rejection with Exim

May 24, 2012

Suppose that you use Exim as your mailer, and you want to do SMTP-time rejection of incoming spam using some outside program that has a sendmail milter protocol. Exim has no native support for the milter protocol, but it is possible to hijack some existing Exim interfaces to more or less achieve this provided that you don't want to try to change the message in flight at this point (only either accept it as is or reject it).

Exim has a content-scanning interface; one of the things it can do is run an external program as an anti-virus scanner (the av_scanner scanner type of cmdline). If you enable things in the acl_smtp_data ACL, the program you run here can signal Exim to reject the message and provide a relatively arbitrary message that Exim can put in the SMTP rejection. Since Exim documents this as an interface for detecting viruses all of the examples talk about things like malware names, but you can use it for anything you want.

A simplified version of our setting for this looks like:

av_scanner = cmdline:/milter/eximdriver.py %s:^REJECT :^REJECT (.+)

Then our DATA ACL contains a stanza with:

deny
  malware = *
  message = Rejected: $malware_name

If eximdriver.py outputs a string that looks like 'REJECT some-reason', Exim will declare that the message contains malware and set $malware_name to the some-reason portion, which we use directly in the SMTP rejection message.

Eximdriver.py has three important pieces. The first is a client side library for the milter protocol, so that it can actually talk to the milter server and get results back. The second is code to load a message from the .eml spool format that Exim writes them to for the AV scanner program; this is basically a standard 'mailbox' format mail message augmented with some special Exim headers. The complication in one's life is that you need to recover the SMTP envelope information from various message headers, including the first Received line.

(You might think that the envelope information could be passed on the command line. Unfortunately not securely. Also, note that the %s in the argument here is not the .eml file itself but the directory it's in. Presumably Exim sets things up this way so that real AV scanners have a per-message directory where they can write whatever temporary files they need.)

The third chunk of eximdriver.py is site-dependent; it interprets the result of the remote milter in order to figure out whether Exim should be told to reject the message and if so, with what reasons. For reasons beyond the scope of this entry, our milter server doesn't give us direct answers on this; instead, it tells us about changes that should be made to the message. Our eximdriver.py reverse engineers these changes back to whether or not the message has a virus and how high a spam score it got and outputs appropriate messages.

As crazy as this may sound, all of this actually works and works fine. It is nowhere near as efficient as direct milter support in Exim would be (we are at least potentially running a Python program for every incoming email message), but our external mail gateway is actually relatively overconfigured for our volume and it's never been a problem for us.

(There is another mitigating factor, which is about to be discussed.)

Our eximdriver.py also does some associated things, like syslog detailed information about rejected messages and some information about accepted ones.

As a side note, it is very deliberate that eximdriver.py must produce output with a specific prefix in order to have email rejected. This is much safer than having any output at all cause message rejection, because it means that if something goes wrong in eximdriver.py (or just with running it, perhaps because of machine overload) we fail safely; we default to accepting the email instead of bouncing it.

(I was going to say that Exim has no easy way for the 'AV scanner' to signal that the mail should be temporarily deferred with a SMTP 4xx, but actually that's wrong. Exim is explicitly documented to default to deferring messages if there's a visible AV scanner problem, and you can use a regular-expression based 'malware = ...' condition in a defer ACL stanza in order to defer based on the milter results as signaled back to Exim in the 'malware name'.)

Sidebar: the gory details

Now I have to confess that this is the simplified version of our configuration, because we have a problem: not all of our users have opted in to SMTP-time rejection of spam messages. In fact, some of our users have opted in to different levels of SMTP-time rejections. This leads to the SMTP DATA problem; because what we say at DATA time applies to all accepted RCPT TO addresses, we have to use the least aggressive level of SMTP-time rejection that everyone has agreed to (possibly right down to 'no SMTP-time rejection'). In turn this means that our Exim configuration has to keep track of this on the fly and eximdriver.py then gets invoked with this level as one of its arguments and only emits REJECT notices if the message qualifies.

Because the current scanning level is a dynamic expansion that must be re-done every time av_scanner is evaluated, our actual av_scanner setting has to look like this:

av_scanner = ${if bool{true} {cmdline:/milter/eximdriver.py -l SCANLVL %s: ^REJECT :^REJECT (.+)}}

The pointless ${if bool{true} ...} portion causes Exim to re-expand this every time it is used so that the current message's scanning level is substituted in.

The simplest way to track the scanning level turned out to be keeping a list of each address's scan level in an ACL variable, $acl_m0_milter. As we handle each address in the RCPT TO ACL, it appends the address's scanning level to the variable (which is initialized to the maximum scan level, currently 3) with an expression like 'set acl_m0_milter = $acl_m0_milter:2' in an otherwise do-nothing warn ACL stanza. The SCANLVL definition reduces this down to the lowest level seen:

SCANLVL = ${reduce {$acl_m0_milter} {3} {${if <{$item}{$value} {$item}{$value}}}}

As an efficiency measure we do not bother doing scanning at all if one or more of the RCPT TO addresses has not opted in to any SMTP-time rejection. This is actually our common case; for various reasons, relatively few people have opted in to any of our server side anti-spam options. This is done with a condition on the 'malware' deny stanza:

condition = ${if >{SCANLVL}{0}{true}{false}}

The result of all of this actually works well, believe it or not.

(Our approach does require that we can put people's anti-spam choices into a linear ordering of some sort. Extensions to schemes involving bitmaps of what people have opted in to are left as an exercise to the dedicated and perverse Exim configuration file writer.)

Written on 24 May 2012.
« Some notes on using XFT fonts in TK 8.5
Today's Mercurial command alias: a short form hg incoming »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu May 24 02:30:44 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.