You should probably track what types of files your users get in email

April 28, 2016

Most of the time our commercial anti-spam system works fine and we don't have to think about it or maintain it (which is one of the great attractions of using a commercial system for this). Today was not one of those times. This morning, we discovered that some incoming email messages we were receiving make its filtering processes hang using 100% CPU; after a while, this caused all inbound email to stop. More specifically, the dangerous incoming messages appeared to be a burst of viruses or malware in zipped .EXEs.

This is clearly a bug and hopefully it will get fixed, but in the mean time we needed to do something about it. Things like, say, blocking all ZIP files, or all ZIP files with .EXEs in them. As we were talking about this, we realized something important: we had no idea how many ZIP files our users normally get, especially how many (probably) legitimate ones. If we temporarily stopped accepting all ZIP file attachments, how many people would we be affecting? No one, or a lot? Nor did we know what sort of file types are common or uncommon in the ZIP files that our users get (legitimate or otherwise), or what sort of file types users get other than ZIP files. Are people getting mailed .EXEs or the like directly? Are they getting mailed anything other than ZIP files as attachments?

(Well, the answer to that one will be 'yes', as a certain amount of HTML email comes with attached images. But you get the idea.)

Knowing this sort of information is important for the same reason as knowing what TLS ciphers your users are using. Someday you may be in our situation and really want to know if it's safe to temporarily (or permanently) block something, or whether it'll badly affect users. And if something potentially dangerous has low levels of legitimate usage, well, you have a stronger case for preemptively doing something about it. All of this requires knowing what your existing traffic is, rather than having to guess or assume, and for that you need to gather the data.

Getting this sort of data for email does have complications, of course. One of them is that you'd really like to be able to distinguish between legitimate email and known spam in tracking this sort of stuff, because blocking known spam is a lot different than blocking legitimate email. This may require logging things in a way that either directly ties them to spam level information and so on or at least lets you cross-correlate later between different logs. This can affect where you want to do the logging; for example, you might want to do logging downstream of your spam detection system instead of upstream of it.

(This is particularly relevant for us because obviously we now need to do our file type blocking and interception upstream of said anti-spam system. I had been dreaming of ways to make it log information about what it saw going by even if it didn't block things, but now maybe not; it'd be relatively hard to correlate its logs again our anti-spam logs.)

Written on 28 April 2016.
« How 'there are no technical solutions to social problems' is wrong
You should plan for your anti-spam scanner malfunctioning someday »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Apr 28 01:36:06 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.