Some options for logging attachment information in an Exim environment

July 10, 2016

Suppose, not entirely hypothetically, that you use Exim as your mailer and you would like to log information about the attachments your users get sent. There are a number of different ways in Exim that you can do this, each with their own drawbacks and advantages. As a simplifying measure, let's assume that you want to do this during the SMTP conversation so that you can potentially reject messages with undesirable attachments (say ZIP files with Windows executables in them).

The first decision to make is whether you will scan and analyze the entire message in your own separate code, or let Exim break the message up into various MIME parts and look at them one-by-one. Examining the entire message at once means that you can log full information about its structure in one place, but it also means that you're doing all of the MIME processing yourself. The natural place to take a look at the whole message is with Exim's anti-virus content-scanning system; you would hook into it in a way similar to how we hooked our milter-based spam rejection into Exim.

(You'll want to use a warn stanza to just cause the scanner to run, and maybe to give you some stuff that you'll get Exim to log with the log_message ACL directive.)

If you want to let Exim section the message up into various different MIME parts for you, then you want a MIME ACL (covered in the Content scanning at ACL time chapter of the documentation). At this point you have another decision to make, which is whether you want to run an external program to analyze the MIME part or whether to rely only on Exim. The advantage of doing things entirely inside Exim is that Exim doesn't have to decode the MIME part to a file for your external program (and then run an outside program for each MIME part); the disadvantage is that you can only log MIME part information and can't do things like spot suspicious attempts to conceal ZIP files.

Mechanically, having Exim do it all means you'd just have a warn stanza in your MIME ACL that logged information like $mime_content_disposition, $mime_content_type, $mime_filename or its extension, and so on, using log_message =. You wouldn't normally use decode = because you have little use for decoding the part to a file unless you're going to have an outside program look at it. If you wanted to run a program against MIME parts, you'd use decode = default and then run the program with $mime_decoded_filename and possibly other arguments via ${run} in, for example, a 'set acl_m1_blah = ...' line.

(There are some pragmatic issues here that I'm deferring to another entry.)

Allowing Exim to section the message up for you is easier in many ways, but has two drawbacks. First, Exim doesn't really provide any way to get the MIME structure of the message, because you just get a stream of parts; you don't necessarily see, for example, how things are nested. The second is that processing things part by part obviously makes it harder to log all the information about a message's file types in a single line; the natural way is to log a separate line for each part, as you process it.

Speaking of logging, if you're running an external program (either for the entire message or for each MIME part) you need to decide whether your program will do the logging or whether you're going to have the program pass information back to Exim and have Exim log it. Passing information back to Exim is more work but means that you'll see your attachment information along with the other log lines for the message. Logging to a place like syslog may make the information more conveniently visible and it's generally going to be easier.

Sidebar: Exim's MIME parsing versus yours

Exim's MIME parsing is in C and is presumably done on an in-place version of the message that Exim already has on disk. It thus should be quite efficient (until you start decoding parts) and hopefully reasonably security hardened. Parsing a message's MIME structure yourself means relying on the speed, quality, resilience against broken MIME messages, and security of whatever code either you write or your language of choice already has for MIME parsing, and it requires Exim to reconstitute a full copy of the message for you.

My experience with Python's standard MIME parsing module was that it's at least somewhat fragile against malformed input. This isn't a security risk (it's Python), but it did mean that my code wound up spending a bunch of time recovering from MIME parsing explosions and trying to extract some information from the mess anyways. I wouldn't be surprised if other languages had standard packages that assumed well-formed input and threw errors otherwise (and it's hard to blame them; dealing with malformed MIME messages is a specialized need).

(Admittedly I don't know how well Exim itself deals with malformed MIME messages and MIME parts. Hopefully it parses them as much as possible, but it may just throw up its hands and punt.)

Written on 10 July 2016.
« How Exim's ${run ...} string expansion operator does quoting
Why Python can't have a full equivalent of Go's gofmt »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 10 01:07:41 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.