2019-03-21
What sorts of good email attachments our users get (March 2019 edition)
Yesterday I looked at the types of attachments we see in malware email. Of course if we're considering blocking some of them, it's not enough to consider just what types we see in malware; we also care about what types we see in legitimate email (or at least in email that is as close to legitimate as we can manage). I did some stats for this a year ago, in the April 2018 edition, but this time around I'm going to be doing the stats slightly differently since I want to compare relatively directly to yesterday's data. Like yesterday, this is over the previous ten weeks, but a slightly different ten weeks (the relevant systems roll their weekly logs at different times).
Over the past ten weeks, we had 54,076 file attachments in 39,607 email messages that were not from DNSBL-listed sources, not identified as spam or virus-laden, and not rejected for other reasons. This is about ten times as many as we had malware attachments, which is either good or bad depending on your perspective. 98.5% of them had MIME filename information, and out of those the most popular file extensions were:
30462 .pdf 4210 .jpg 3688 .docx 1939 .png 1773 .ics 1339 .xlsx 1009 .txt 725 .html 682 .doc 640 .zip
If I reprocess the data to count how many messages had any particular type of file attachment, the data breaks down this way:
23789 .pdf 3177 .docx 3075 .jpg 1757 .ics 1221 .png 1172 .xlsx 744 .txt 690 .html 629 .asc 602 .zip 595 .doc
It is probably not surprising that the image formats drop in this re-ranking, since it's likely common to attach several images to a single message. To my surprise, a number of messages had multiple .zip file attachments, which is why the .zip numbers drop. Multiple .doc and .docx attachments are relatively common.
(In the 'things that make me raise my eyebrows now that I'm looking at them' category, there was one message with 24 .wmz attachments. It came from a 'marketing@<domain>' address, so maybe it was genuine and just, well, marketing.)
Basically all of these file types are unsurprising in our environment (academic computer science). All of the .asc files are PGP stuff (and have appropriate MIME types); I'm a bit surprised that we see so much of it in our email, but then some of this email is things like update notifications from Ubuntu and other sources that's PGP signed. Use of .p7s is not too much below the use of .asc, at 588 attachments. I am a bit surprise to see so many .html attachments, but perhaps some of that is mail sending programs improperly marking HTML parts as attachments instead of inline content.
Nothing particularly stands out about the contents of .zip files and ZIP archives in general, so I'm going to skip any extensive analysis or discussion of them.
At this point it's useful to cross-compare some suspicious file types from yesterday that haven't already been mentioned to see how many legitimate versions of them we see:
444 .xls 18 .rar 1 .iso 1 .docm
We clearly can't reject .xls file attachments, but it seems likely we could reject .docm and .iso attachments. I was going to say that we could probably reject .rar file attachments as well, but then I took a second look at our data. We could read the RAR file list for all but four of those .rar attachments, and all of the file types in them look legitimate; on closer inspection (eg of source and destination information), even the remaining four look good. It looks like some people just prefer RAR to ZIP, which I can't blame them for.
(The good news version of this finding is that our commercial anti-spam system is apparently very good at finding bad stuff in .rars, since no bad ones seem to have slipped past it.)