What sorts of good email attachments our users get (March 2019 edition)
Yesterday I looked at the types of attachments we see in malware email. Of course if we're considering blocking some of them, it's not enough to consider just what types we see in malware; we also care about what types we see in legitimate email (or at least in email that is as close to legitimate as we can manage). I did some stats for this a year ago, in the April 2018 edition, but this time around I'm going to be doing the stats slightly differently since I want to compare relatively directly to yesterday's data. Like yesterday, this is over the previous ten weeks, but a slightly different ten weeks (the relevant systems roll their weekly logs at different times).
Over the past ten weeks, we had 54,076 file attachments in 39,607 email messages that were not from DNSBL-listed sources, not identified as spam or virus-laden, and not rejected for other reasons. This is about ten times as many as we had malware attachments, which is either good or bad depending on your perspective. 98.5% of them had MIME filename information, and out of those the most popular file extensions were:
30462 .pdf 4210 .jpg 3688 .docx 1939 .png 1773 .ics 1339 .xlsx 1009 .txt 725 .html 682 .doc 640 .zip
If I reprocess the data to count how many messages had any particular type of file attachment, the data breaks down this way:
23789 .pdf 3177 .docx 3075 .jpg 1757 .ics 1221 .png 1172 .xlsx 744 .txt 690 .html 629 .asc 602 .zip 595 .doc
It is probably not surprising that the image formats drop in this re-ranking, since it's likely common to attach several images to a single message. To my surprise, a number of messages had multiple .zip file attachments, which is why the .zip numbers drop. Multiple .doc and .docx attachments are relatively common.
(In the 'things that make me raise my eyebrows now that I'm looking at them' category, there was one message with 24 .wmz attachments. It came from a 'marketing@<domain>' address, so maybe it was genuine and just, well, marketing.)
Basically all of these file types are unsurprising in our environment (academic computer science). All of the .asc files are PGP stuff (and have appropriate MIME types); I'm a bit surprised that we see so much of it in our email, but then some of this email is things like update notifications from Ubuntu and other sources that's PGP signed. Use of .p7s is not too much below the use of .asc, at 588 attachments. I am a bit surprise to see so many .html attachments, but perhaps some of that is mail sending programs improperly marking HTML parts as attachments instead of inline content.
Nothing particularly stands out about the contents of .zip files and ZIP archives in general, so I'm going to skip any extensive analysis or discussion of them.
At this point it's useful to cross-compare some suspicious file types from yesterday that haven't already been mentioned to see how many legitimate versions of them we see:
444 .xls 18 .rar 1 .iso 1 .docm
We clearly can't reject .xls file attachments, but it seems likely we could reject .docm and .iso attachments. I was going to say that we could probably reject .rar file attachments as well, but then I took a second look at our data. We could read the RAR file list for all but four of those .rar attachments, and all of the file types in them look legitimate; on closer inspection (eg of source and destination information), even the remaining four look good. It looks like some people just prefer RAR to ZIP, which I can't blame them for.
(The good news version of this finding is that our commercial anti-spam system is apparently very good at finding bad stuff in .rars, since no bad ones seem to have slipped past it.)
The types of attachments we see in malware email (March 2019 edition)
Back in mid 2017 I wrote about the types of attachments we saw then in malware-laden email. Today, for reasons beyond the scope of this entry, I feel like looking at our current numbers on this, based on the previous ten weeks of activity. This does not include the slowly but steadily growing collection of attachment types we reject immediately, but it does include 'malware' that is a phish spam in an actual attachment, because that's what our commercial anti-spam system does. As we will see, this is actually a large category of what we detect as 'malware'.
Over 99% of the detected malware attachments had MIME filenames. Out of the 5622 attachments with filenames, the most common file extensions were:
3008 .html 1134 .doc 536 .xlsx 246 .rar 245 .iso 60 .docm 58 .txt 57 .docx 44 .zip 36 .xls
More than half of these attachments were in messages detected as phish (more or less 55%, as it turns out). However, not all of the phish spam used .html attachments, or at least not directly; instead, it breaks down like this:
3008 MIME file ext: .html 58 MIME file ext: .txt 23 MIME file ext: .zip 6 MIME file ext: .jpg 3 MIME file ext: .png 1 MIME file ext: .htm
All of those .zip attachments actually contain a single .html file. We've seen this sort of single file ZIP smuggling before (1, 2) and now reject it outright for certain file types. We probably don't want to extend that to .html files, but it's slightly tempting.
Out of all of the various things that detect as ZIP archives (which is a lot more than .zip file attachments), there is no particularly dominating set of contents. We do see a certain number of ZIP archives that contain just a single .jar or a .jar plus a .txt, but the absolute numbers are too low to consider a 'reject on sight' policy for them (especially as our users may actually want to get .jars every so often).
My overall conclusion from this is that we don't really have any additional smoking gun file attachment types that we could argue for automatically rejecting on sight. We could raise the argument for .rar and .iso, but they are only 4% or so of the attachments in general. Anyway, this is only half the story; to really ask this question, we need to look at what sort of legitimate attachments our users get and that's another entry.
(Some but not very many messages detected with malware had multiple attachments. I'm not currently interested enough to do a breakdown of what types those messages use. For our purposes, any 'bad' file type that's commonly seen in malware laden email is suspect regardless of whether or not it actually contained the malware.)