How we do MIME attachment type logging with Exim

July 12, 2016

Last time around I talked about the options you have for how to log attachment information in an Exim environment. Out of our possible choices, we opted to do attachment logging using an external program that's run through Exim's MIME ACL, and to report the result to syslog in the program. All of this is essentially the least-effort choice. Exim parses MIME for us, and having the program do the logging means that it gets to make the decisions about just what to log.

However, the details are worth talking about, so let's start with the actual MIME ACL stanza we use:

# used only for side effects
  # only act on potentially interesting parts
  condition = ${if or { \
     {and{{def:mime_content_disposition}{!eq{$mime_content_disposition}{inline}}}} \
     {match{$mime_content_type}{\N^(application|audio|video|text/xml|text/vnd)\N}} \
    } }
  decode = default
  # set a dummy variable to get ${run} executed
  set acl_m1_astatus = ${run {/etc/exim4/alogger/ \
     --subject ${quote:$header_subject:} \
     --csdnsbl ${quote:$header_x-cs-dnsbl:} \
     $message_exim_id \
     ${quote:$mime_content_type} \
     ${quote:$mime_content_disposition} \
     ${quote:$mime_filename} \
     ${quote:$mime_decoded_filename} }}

(See my discussion of quoting for ${run} for what's happening here.)

The initial 'condition =' is an attempt to only run our external program (and writing decoded MIME parts out to disk) for MIME parts that are likely to be interesting. Guessing what is an attachment is complicated and the program makes the final decision, but we can pre-screen some things. The parts we consider interesting are any MIME parts that explicitly declare themselves as non-inline, plus any inline MIME parts that have a Content-Type that's not really an inline thing.

There is one complication here, which is our check that $mime_content_disposition is defined. You might think that there's always going to be some content-disposition, but it turns out that when Exim says the MIME ACL is invoked on every MIME part it really means every part. Specifically, the MIME ACL is also invoked on the message body in a MIME email that is not a multipart (just, eg, a text/plain or text/html message). These single-part MIME messages can be detected because they don't have a defined content-disposition; we consider this to basically be an implicit 'inline' disposition and thus not interesting by itself.

The entire warn stanza exists purely to cause the ${run} to execute (this is a standard ACL trick; warn stanzas are often used just as a place to put ACL verbs). The easiest way to get that to happen is to (nominally) set the value of an ACL variable, as we do here. Setting an ACL variable makes Exim do string expansion in a harmless context that we can basically make into a no-op, which is what we need here.

(Setting a random ACL variable to cause string expansion to be done for its side effects is a useful Exim pattern in general. Just remember to add a comment saying it's deliberate that this ACL variable is never used.)

The actual attachment logger program is written in Python because basically the moment I started writing it, it got too complicated to be a shell script. It looks at the content type, the content disposition, and any claimed MIME filename in order to decide whether this part should be logged about or ignored (using the set of heuristics I outlined here). It uses the decoded content to sniff for ZIP and RAR archives and get their filenames (slightly recursively). We could have run more external programs for this, but it turns out that there are handy Python modules (eg the zipfile module) that will do the work for us. Working in pure Python probably doesn't perform as well as some of the alternatives, but it works well enough for us with our current load.

(In accord with my general principles, the program is careful to minimize the information it logs. For instance, we log only information about extensions, not filenames.)

The program is also passed the contents of some of the email headers so that it can add important information from them to the log message. Our anti-spam system adds a spam or virus marker to the Subject: header for recognized bad stuff, so we look for that marker and log if the attachment is part of a message scored that way. This is important for telling apart file types in real email that users actually care about from file types in spam that users probably don't.

(We've found it useful to log attachment type information on inbound email both before and after it passes through our anti-spam system. The 'before' view gives us a picture of what things look like before virus attachment stripping and various rejections happen, while the 'after' view is what our users actually might see in their mailboxes, depending on how they filter things marked as spam.)

Sidebar: When dummy variables aren't

I'll admit it: our attachment logger program prints out a copy of what it logs and our actual configuration uses $acl_m1_astatus later, which winds up containing this copy. We currently immediately reject all messages with ZIP files with .exes in them, and rather than parse MIME parts twice it made more sense to reuse the attachment logger's work by just pattern-matching its output.

Written on 12 July 2016.
« Why Python can't have a full equivalent of Go's gofmt
Our central web server, Apache, and slow downloads »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 12 00:51:59 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.