More badly encoded MIME Content-Disposition headers

January 28, 2020

One of the things that our system for recording email attachment type information logs is the MIME Content-Disposition header for a MIME part, if it exists. Another thing it logs is the extension of the claimed MIME filename, if this information is part of the MIME headers for that part (people and programs don't always put it in). Under normal circumstances, the filename of a MIME part is given as a 'filename=...' parameter on the Content-Disposition header (although it may also come from a 'name=...' parameter on the Content-Type, for historical reasons (via)).

Now suppose that your attachment has a non-ASCII filename and you want to put it in the MIME headers, which are theoretically ASCII only. How you're supposed to deal with this is somewhat tangled. In theory you're apparently supposed to use RFC 2231, which defines a whole encoding scheme. In practice this encoding scheme seems to be pretty rare and not frequently observed; instead, the common thing to do seems to be to use RFC 2047 encoding for just the filename (possibly inside quotes). This is how most software does it.

(I've seen a message that used both at once, with the name= parameter on Content-Type encoded using RFC 2047 and the entire Content-Disposition header done with RFC 2231. I didn't cross-check to see whether the filenames came out the same.)

Of course, when you run an anti-spam system and turn over rocks by looking at your logs, sometimes you find surprises. When I was looking at our log of attachment information recently, I discovered one with an attachment type that looked like the following:

=?utf-8?q?attachment=3b_filename=3d=22tnt_import_clear?= =?utf-8?q?ance=e2=80=93_consignment_=239066721066-pdf=2eace=22?=

(I've broken this into two pieces for the blog, but it was one originally.)

If we decode this following RFC 2047, we get:

attachment; filename="tnt import clearance– consignment #9066721066-pdf.ace"

This seems to be some piece of malware that's used RFC 2047 encoded-word syntax on the entire Content-Disposition header, rather than just the filename. Whether anyone's email software will interpret this in a way that's useful for the malware is an open question, but probably they will since the malware does this. Some software will certainly not interpret it, and unfortunately part of that software is our system for rejecting email with bad attachment types at SMTP time.

(The reason that this came to my attention was that our commercial anti-spam software rejected it as CXmail/MalPE-AW while the unofficial ClamAV signatures we use detected it as 'Sanesecurity Malware 25738 AceHeur Exe'. Since we normally reject email with .ace attachments at SMTP time before we do further anti-virus checking, this would have been rejected immediately if not for its very encoded Content-Disposition, which prevented our current setup from recognizing it.)

It turns out that this isn't even the first time I've spotted and noted these things; back at the end of 2018, I wrote about some odd Content-Dispositions, including ones that were mis-encoded this way. At the time I don't seem to have noted ones that were attachment types that we'd have rejected, so I didn't think of it as a high priority to deal with in our attachment logging software.

Since pretty much all of these that we've seen are spam and malware, and this is unambiguously incorrect (and may be intended in part to evade anti-virus systems), I'm a bit tempted to make our external mail gateway reject all email with these badly encoded Content-Dispositions. That would save us from having to deal with various cases of decoding these things and then trying to parse the resulting header ourselves.


Comments on this page:

Notably, the Gmail web frontend encodes attachment filenames with encoded words, in direct violation of RFC 2047 5.3.

Written on 28 January 2020.
« The real world is mutable (and consequences for system design)
Why ZFS is not good at growing and reshaping pools (or shrinking them) »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Tue Jan 28 00:19:02 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.