#pragma search blog/sysadmin == How we do MIME attachment type logging with Exim Last time around I talked about [[the options you have for how to log attachment information in an Exim environment EximAttachmentLoggingOptions]]. Out of our possible choices, we opted to do attachment logging using an external program that's run through Exim's MIME ACL, and to report the result to syslog in the program. All of this is essentially the least-effort choice. Exim parses MIME for us, and having the program do the logging means that it gets to make the decisions about just what to log. However, the details are worth talking about, so let's start with the actual MIME ACL stanza we use: .pn prewrap on > # used only for side effects > warn > # only act on potentially interesting parts > condition = ${if or { \ > {and{{def:mime_content_disposition}{!eq{$mime_content_disposition}{inline}}}} \ > {match{$mime_content_type}{\N^(application|audio|video|text/xml|text/vnd)\N}} \ > } } > # > decode = default > # set a dummy variable to get ${run} executed > set acl_m1_astatus = ${run {/etc/exim4/alogger/alogger.py \ > --subject ${quote:$header_subject:} \ > --csdnsbl ${quote:$header_x-cs-dnsbl:} \ > $message_exim_id \ > ${quote:$mime_content_type} \ > ${quote:$mime_content_disposition} \ > ${quote:$mime_filename} \ > ${quote:$mime_decoded_filename} }} (See [[my discussion of quoting for _${run}_ EximRunAndQuoting]] for what's happening here.) The initial '_condition =_' is an attempt to only run our external program (and writing decoded MIME parts out to disk) for MIME parts that are likely to be interesting. [[Guessing what is an attachment is complicated ../spam/KnowingWhatIsAnAttachment]] and the program makes the final decision, but we can pre-screen some things. The parts we consider interesting are any MIME parts that explicitly declare themselves as non-inline, plus any inline MIME parts that have a Content-Type that's not really an inline thing. There is one complication here, which is our check that (($mime_content_disposition)) is defined. You might think that there's always going to be some content-disposition, but it turns out that when Exim says the MIME ACL is invoked on every MIME part it really means *every* part. Specifically, the MIME ACL is also invoked on the message body in a MIME email that is not a multipart (just, eg, a _text/plain_ or _text/html_ message). These single-part MIME messages can be detected because they don't have a defined content-disposition; we consider this to basically be an implicit 'inline' disposition and thus not interesting by itself. The entire _warn_ stanza exists purely to cause the _${run}_ to execute (this is a standard ACL trick; _warn_ stanzas are often used just as a place to put ACL verbs). The easiest way to get that to happen is to (nominally) set the value of an ACL variable, as we do here. Setting an ACL variable makes Exim do string expansion in a harmless context that we can basically make into a no-op, which is what we need here. (Setting a random ACL variable to cause string expansion to be done for its side effects is a useful Exim pattern in general. Just remember to add a comment saying it's deliberate that this ACL variable is never used.) The actual attachment logger program is written in Python because basically the moment I started writing it, it got too complicated to be a shell script. It looks at the content type, the content disposition, and any claimed MIME filename in order to decide whether this part should be logged about or ignored (using the set of heuristics I outlined [[here ../spam/KnowingWhatIsAnAttachment]]). It uses the decoded content to sniff for ZIP and RAR archives and get their filenames ([[slightly recursively ../spam/VirusesDoConcealZipFiles]]). We could have run more external programs for this, but it turns out that there are handy Python modules (eg [[the _zipfile_ module https://docs.python.org/2.7/library/zipfile.html]]) that will do the work for us. Working in pure Python probably doesn't perform as well as some of the alternatives, but it works well enough for us with our current load. (In accord with [[my general principles NotLoggingThings]], the program is careful to minimize the information it logs. For instance, we log only information about extensions, not filenames.) The program is also passed the contents of some of the email headers so that it can add important information from them to the log message. Our anti-spam system adds a spam or virus marker to the _Subject:_ header for recognized bad stuff, so we look for that marker and log if the attachment is part of a message scored that way. This is important for telling apart file types in real email that users actually care about from file types in spam that users probably don't. (We've found it useful to log attachment type information on inbound email both before and after it passes through [[our anti-spam system ../spam/CSLabSpamFilteringII]]. The 'before' view gives us a picture of what things look like before virus attachment stripping and various rejections happen, while the 'after' view is what our users actually might see in their mailboxes, depending on how they filter things marked as spam.) === Sidebar: When dummy variables aren't I'll admit it: our attachment logger program prints out a copy of what it logs and our actual configuration uses (($acl_m1_astatus)) later, which winds up containing this copy. We currently immediately reject all messages with ZIP files with _.exe_s in them, and rather than parse MIME parts twice it made more sense to reuse the attachment logger's work by just pattern-matching its output.