More badly encoded MIME Content-Disposition headers
One of the things that our system for recording email attachment
logs is the MIME
Content-Disposition header for a MIME part, if
it exists. Another thing it logs is the extension of the claimed
MIME filename, if this information is part of the MIME headers for
that part (people and programs don't always put it in). Under normal
circumstances, the filename of a MIME part is given as a 'filename=...'
parameter on the Content-Disposition header (although it may also
come from a 'name=...' parameter on the Content-Type, for historical
Now suppose that your attachment has a non-ASCII filename and you want to put it in the MIME headers, which are theoretically ASCII only. How you're supposed to deal with this is somewhat tangled. In theory you're apparently supposed to use RFC 2231, which defines a whole encoding scheme. In practice this encoding scheme seems to be pretty rare and not frequently observed; instead, the common thing to do seems to be to use RFC 2047 encoding for just the filename (possibly inside quotes). This is how most software does it.
(I've seen a message that used both at once, with the name= parameter on Content-Type encoded using RFC 2047 and the entire Content-Disposition header done with RFC 2231. I didn't cross-check to see whether the filenames came out the same.)
Of course, when you run an anti-spam system and turn over rocks by looking at your logs, sometimes you find surprises. When I was looking at our log of attachment information recently, I discovered one with an attachment type that looked like the following:
(I've broken this into two pieces for the blog, but it was one originally.)
If we decode this following RFC 2047, we get:
attachment; filename="tnt import clearance– consignment #9066721066-pdf.ace"
This seems to be some piece of malware that's used RFC 2047 encoded-word syntax on the entire Content-Disposition header, rather than just the filename. Whether anyone's email software will interpret this in a way that's useful for the malware is an open question, but probably they will since the malware does this. Some software will certainly not interpret it, and unfortunately part of that software is our system for rejecting email with bad attachment types at SMTP time.
(The reason that this came to my attention was that our commercial anti-spam software rejected it as CXmail/MalPE-AW while the unofficial ClamAV signatures we use detected it as 'Sanesecurity Malware 25738 AceHeur Exe'. Since we normally reject email with .ace attachments at SMTP time before we do further anti-virus checking, this would have been rejected immediately if not for its very encoded Content-Disposition, which prevented our current setup from recognizing it.)
It turns out that this isn't even the first time I've spotted and noted these things; back at the end of 2018, I wrote about some odd Content-Dispositions, including ones that were mis-encoded this way. At the time I don't seem to have noted ones that were attachment types that we'd have rejected, so I didn't think of it as a high priority to deal with in our attachment logging software.
Since pretty much all of these that we've seen are spam and malware, and this is unambiguously incorrect (and may be intended in part to evade anti-virus systems), I'm a bit tempted to make our external mail gateway reject all email with these badly encoded Content-Dispositions. That would save us from having to deal with various cases of decoding these things and then trying to parse the resulting header ourselves.