What sorts of good email attachments our users get (April 2018 edition)

April 28, 2018

I've looked at various breakdowns of bad attachment types that get sent to our users, but of course that's not the only reason we collect all of this data. In fact it's the lesser reason; the greater one is to know the legitimate types of files our users get in email. So today I'm going to look at a week's worth of data from our central mail server, which is logged after all rejecting, filtering, and spam tagging has been applied.

Over that week, we logged 4,166 attachments from 3,093 email messages. Some email messages had quite a lot of attachments; the winner had 26 attachments, and then there's one with 23, two with 12, four with 9, nine with 8, and I've run out of patience to count from there. The median message has one attachment, though, as you'd expect.

Almost all of the attachments had MIME filenames; only 50 didn't. For those 50, the MIME types varied, with the most popular one being message/rfc822, but there are also images, text/plain, text/html, PGP signatures, and apparently one Office XML file. For the attachments with MIME file extensions, the most popular types break down like this:

  2339 MIME file ext: .pdf
   429 MIME file ext: .docx
   273 MIME file ext: .jpg
   159 MIME file ext: .xlsx
   125 MIME file ext: .png
    91 MIME file ext: .doc
    72 MIME file ext: .ics
    69 MIME file ext: .txt
    57 MIME file ext: .asc
    54 MIME file ext: .html

There were 72 different MIME file extensions in total, although some of them are clearly not real file extensions but instead just parts of the filename that happened to go after a dot. One is a timestamp, for example. These may be for regular filenames that had extra stuff added on, for example 'file.pdf.<timestamp>'.

The popularity of PDF files is no surprise, given that we're a university department. That may also explain how MS Word scores relatively highly (and perhaps the spreadsheets too, but I don't know there). All of the .asc cases are PGP signatures (and were sent with the MIME type application/pgp-signature), and some of them come from mailing list email that I get.

I took a look at MIME type information, and unsurprisingly it is somewhat less reliable than MIME file extensions. For instance, here is the MIME type breakdown for .pdf attachments:

  2160 application/pdf
   175 application/octet-stream
     1 pdf
     1 application/octetstream
     1 application/octet
     1 application/download

Looking at all attachments, application/octet-stream was the third most popular MIME type. Mostly it's used for PDFs, but there is a long tail of MIME filename extensions, which doesn't really surprise me. If a mail program is attaching something to a message and it's not completely sure what it is, application/octet-stream will get the job done and no one can really argue with you for picking it.

(Sometimes I look at this data and what I find is, well, data.)

Comments on this page:

By skeeto at 2018-04-28 08:40:36:

I wonder which MUA out there is missing a dash in its "octetstream" MIME type entry.

Written on 28 April 2018.
« Some notes on Firefox's current media autoplay settings
My new 4K HiDPI display really does make a visible difference »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Apr 28 01:19:47 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.