What sorts of good email attachments our users get (April 2018 edition)
I've looked at various breakdowns of bad attachment types that get sent to our users, but of course that's not the only reason we collect all of this data. In fact it's the lesser reason; the greater one is to know the legitimate types of files our users get in email. So today I'm going to look at a week's worth of data from our central mail server, which is logged after all rejecting, filtering, and spam tagging has been applied.
Over that week, we logged 4,166 attachments from 3,093 email messages. Some email messages had quite a lot of attachments; the winner had 26 attachments, and then there's one with 23, two with 12, four with 9, nine with 8, and I've run out of patience to count from there. The median message has one attachment, though, as you'd expect.
Almost all of the attachments had MIME filenames; only 50 didn't. For those 50, the MIME types varied, with the most popular one being message/rfc822, but there are also images, text/plain, text/html, PGP signatures, and apparently one Office XML file. For the attachments with MIME file extensions, the most popular types break down like this:
2339 MIME file ext: .pdf 429 MIME file ext: .docx 273 MIME file ext: .jpg 159 MIME file ext: .xlsx 125 MIME file ext: .png 91 MIME file ext: .doc 72 MIME file ext: .ics 69 MIME file ext: .txt 57 MIME file ext: .asc 54 MIME file ext: .html
There were 72 different MIME file extensions in total, although some of them are clearly not real file extensions but instead just parts of the filename that happened to go after a dot. One is a timestamp, for example. These may be for regular filenames that had extra stuff added on, for example 'file.pdf.<timestamp>'.
The popularity of PDF files is no surprise, given that we're a
university department. That may also explain how MS Word scores
relatively highly (and perhaps the spreadsheets too, but I don't
know there). All of the
.asc cases are PGP signatures (and were
sent with the MIME type application/pgp-signature), and some of
them come from mailing list email that I get.
I took a look at MIME type information, and unsurprisingly it is somewhat less reliable than MIME file extensions. For instance, here is the MIME type breakdown for .pdf attachments:
2160 application/pdf 175 application/octet-stream 1 pdf 1 application/octetstream 1 application/octet 1 application/download
Looking at all attachments, application/octet-stream was the third most popular MIME type. Mostly it's used for PDFs, but there is a long tail of MIME filename extensions, which doesn't really surprise me. If a mail program is attaching something to a message and it's not completely sure what it is, application/octet-stream will get the job done and no one can really argue with you for picking it.
(Sometimes I look at this data and what I find is, well, data.)