Wandering Thoughts archives

2018-12-28

An odd MIME Content-Disposition or two

One of the things that our system for recording email attachment type information logs is the MIME Content-Disposition header, if it exists. In theory there should be only three cases for this header; if it exists, it should be either inline or attachment, and it might not exist if the message doesn't have multiple MIME parts (because then the implicit disposition is 'inline'). In practice, well, you can guess what happens here.

The first thing that happens is that some number of MIME parts just omit having a Content-Disposition. This is probably legitimate these days (I would have to read the MIME RFCs to know for sure, and I'm not that interested). The more interesting thing is that rarely, people put other values into their C-D headers.

The most normal alternate thing we've seen in C-D headers over the past 60 weeks is the value 'csv'; all of the cases we've seen are for .csv files with the claimed MIME type of application/vnd.ms-excel. Spot-checking a couple of such messages shows that they come from ncbi.nlm.nih.gov, so I suspect that there's some system there for sending out CSV files that does this.

We saw one case of 'attachement' (with an extra 'e' in there), for a PDF file. It's possible this was malware, but it's also possible it's some automated PDF-sending system that manually constructs MIME messages and has gotten the spelling slightly off. We also saw one case of 'related', for a .ico file; again I don't have clear enough signs to guess on malware versus not.

However the case that drove me to write this entry is that last week we had a burst of 14 messages, all with the very special Content-Disposition of:

=?utf-8?b?yxr0ywnobwvuddsgzmlszw5hbwu9ius7moasvuwhrq==?= =?utf-8?b?6k+bsfncqza1nta1lnhsc3gi?=

(I've broken this into two parts for this entry, but in the original it was all one line. This is an RFC 2047 encoded-word thing, per here.)

All 14 of these were identified by our commercial anti-spam system as Exp/20180802-B, which we've seen before. The base-64 Content-Disposition decodes into something that ends in .xlsx, and indeed the attachment was an application/xml ZIP archive with the same cluster of internal file extensions:

zip exts: .bin .png .rels[3] .vml .xml[10] none

Contrary to what I sort of expected, it turns out that these messages are nont single MIME parts but are instead multipart/mixed. Presumably they were directly crafted by something that made a little mistake with what went into the Content-Disposition field, but still managed to sort of properly encode it.

Looking back, over the past 60 weeks we've also seen what look like some other coding mistakes, for example some Content-Dispositions of:

=?utf-8?q?attachment=3b_filename=3d=22payment_instruc?= =?utf-8?q?=e2=80=a6n_-6782_invoce=2etar=22?=

(These two messages were detected as CXmail/MalPE-AC.)

This looks like someone passed the disposition plus the MIME filename to a function designed to encode the disposition alone, which did the best it could under the circumstances. We also saw a third that did the same but with a different filename.

As a side note, 'attachment' is by far the most common Content-Disposition over the past 60 weeks, amounting to about 96.3% of the MIME parts we see. In second place is 'inline', with about 2.3%, and then no Content-Disposition header, at 1.3%. Interestingly, the most common 'inline' file type is PDFs, at 73%, followed by .jpg at 6.7%. I'm surprised that PDFs are so high here, because I wouldn't have thought that they were things mail sending programs ask to be viewed inline.

(A random check on some PDFs I've been sent in email didn't turn up any marked as 'inline'.)

OddMimeContentDisposition written at 23:52:42; Add Comment

2018-12-24

Plaintext parts of email are fading away (in spam and non-spam)

One of the things that I've been noticing these days is how much plaintext parts of emails are fading away. I'm not talking here about HTML-only emails (which have been on the rise here for years); instead, this is about MIME multipart/alternative email which theoretically has both a plaintext and a HTML portion. For years I've had my mail system set to show me the plaintext version instead of the HTML version. For a long time that worked reasonably well, but increasingly it's not; when there is a plaintext version that isn't just 'get a HTML capable client', more and more often the plaintext version is incomplete or otherwise not really functional.

This happens in regular email and it also happens in spam email. For instance, my spamtraps recently captured some email where the plaintext portion started:

To view it online, please go here: %%webversion%%

That's the literal text, and it comes from a spam operation that's clearly organized and using dedicated software (and servers) for their spamming.

Of course, plenty of spammers still use plaintext or functional multipart messages; it seems to be especially common with advance fee fraud spammers, who generally have plain text messages anyway and who may be using well implemented webmail software that does this right. But if spammers (and significant mailing list operations) cannot be bothered to even look at their plaintext versions and get them functional, I have to conclude that plaintext versions are becoming vestigial remnants in the modern email ecosystem.

This isn't surprising, really. If anything it's sort of surprising that it hasn't happened before now. Apparently inertia is a thing.

Unfortunately, since this is done by both spam software and legitimate senders, a significant mismatch between the plaintext version and the HTML version is probably not a useful sign of spam. Depending on your tastes and who you get email from, it may still be a useful sign of email you don't want to read.

FadingPlaintextParts written at 02:40:17; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.