Wandering Thoughts archives

2019-03-21

What sorts of good email attachments our users get (March 2019 edition)

Yesterday I looked at the types of attachments we see in malware email. Of course if we're considering blocking some of them, it's not enough to consider just what types we see in malware; we also care about what types we see in legitimate email (or at least in email that is as close to legitimate as we can manage). I did some stats for this a year ago, in the April 2018 edition, but this time around I'm going to be doing the stats slightly differently since I want to compare relatively directly to yesterday's data. Like yesterday, this is over the previous ten weeks, but a slightly different ten weeks (the relevant systems roll their weekly logs at different times).

Over the past ten weeks, we had 54,076 file attachments in 39,607 email messages that were not from DNSBL-listed sources, not identified as spam or virus-laden, and not rejected for other reasons. This is about ten times as many as we had malware attachments, which is either good or bad depending on your perspective. 98.5% of them had MIME filename information, and out of those the most popular file extensions were:

 30462  .pdf
  4210  .jpg
  3688  .docx
  1939  .png
  1773  .ics
  1339  .xlsx
  1009  .txt
   725  .html
   682  .doc
   640  .zip

If I reprocess the data to count how many messages had any particular type of file attachment, the data breaks down this way:

 23789  .pdf
  3177  .docx
  3075  .jpg
  1757  .ics
  1221  .png
  1172  .xlsx
   744  .txt
   690  .html
   629  .asc
   602  .zip
   595  .doc

It is probably not surprising that the image formats drop in this re-ranking, since it's likely common to attach several images to a single message. To my surprise, a number of messages had multiple .zip file attachments, which is why the .zip numbers drop. Multiple .doc and .docx attachments are relatively common.

(In the 'things that make me raise my eyebrows now that I'm looking at them' category, there was one message with 24 .wmz attachments. It came from a 'marketing@<domain>' address, so maybe it was genuine and just, well, marketing.)

Basically all of these file types are unsurprising in our environment (academic computer science). All of the .asc files are PGP stuff (and have appropriate MIME types); I'm a bit surprised that we see so much of it in our email, but then some of this email is things like update notifications from Ubuntu and other sources that's PGP signed. Use of .p7s is not too much below the use of .asc, at 588 attachments. I am a bit surprise to see so many .html attachments, but perhaps some of that is mail sending programs improperly marking HTML parts as attachments instead of inline content.

Nothing particularly stands out about the contents of .zip files and ZIP archives in general, so I'm going to skip any extensive analysis or discussion of them.

At this point it's useful to cross-compare some suspicious file types from yesterday that haven't already been mentioned to see how many legitimate versions of them we see:

   444  .xls
    18  .rar
     1  .iso
     1  .docm

We clearly can't reject .xls file attachments, but it seems likely we could reject .docm and .iso attachments. I was going to say that we could probably reject .rar file attachments as well, but then I took a second look at our data. We could read the RAR file list for all but four of those .rar attachments, and all of the file types in them look legitimate; on closer inspection (eg of source and destination information), even the remaining four look good. It looks like some people just prefer RAR to ZIP, which I can't blame them for.

(The good news version of this finding is that our commercial anti-spam system is apparently very good at finding bad stuff in .rars, since no bad ones seem to have slipped past it.)

spam/GoodAttachmentTypes-2019-03 written at 20:25:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.