2016-05-27
Your overall anti-spam system should have manual emergency blocks
We mostly rely on a commercial anti-spam system for our incoming spam filtering (as described here), and many other people rely on a variety of open source options for their spam filtering. This generally works very well, with us (and you) getting to offload the work of maintaining a high quality anti-spam system to other people (and it's certainly a lot of work). But not always (and not just because it malfunctions). The realities of life are that sooner or later you will be hit by a spam run that your anti-spam system doesn't recognize, either because the spam run is really new or because it's pretty specific to you.
Much of the time, you can shrug your shoulders and let this go. No anti-spam system is perfect and one of the tradeoffs you make when relying on a third-party system is that it's broadly out of your hands (sometimes this is an advantage). But some of the time this isn't going to be good enough; either the volume or the threat to your users will be so high that you can't just sit on your hands.
(Modern ransomware is making this clear by creating a potentially very high cost of allowing some things through.)
When this day comes to pass, you'll want to have the ability to step in and block the traffic even though your automated anti-spam system is happy with it. This can take many forms, depending on how you want to handle it; you could figure out how to write custom rules for your anti-spam system (so you can outright block certain sorts of files or certain URLs or whatever), or you can build blocking features into your mailer configuration itself, or any number of other options.
Having been through having to do this on the fly during an emergency, my strong suggestion is that you build the infrastructure for these manual blocks now, before you need them. It's some additional up front work and if you're lucky you may never need it, but doing it now when you have time to plan and test and figure out the best way to do things beats having to do it on the fly, under pressure.
Sidebar: What I think you should have manual blocks for
On the one hand attacker ingenuity is very deep, but on the other hand certain patterns repeat over and over again. So my view is that you can probably cover most ground with the ability to put in place manual blocks against sending IPs, sending domains, file extensions (including inside file containers like ZIP files), and whole and partial URLs (for phishing campaigns). You might also want a general message header and body regular expression matching system, but that's starting to feel like scope creep to me.
(Of course real scope creep would be to start by creating a general, generic framework for writing relatively arbitrary manual blocks on message attributes.)
2016-05-22
My view of Barracuda's public DNSBL
In a comment on this entry, David asked, in part:
Have you tried the Barracuda and Hostkarma DNSBLs? [...]
I hadn't heard of Hostkarma before, so I don't have anything to say about it. But I am somewhat familiar with Barracuda's public DNSBL and based on my experiences I'm not likely to use it any time soon. As for why, well, David goes on to mention:
[...] Barracuda in particular lists more aggressively and is willing to punish lower volume relays that fail to mitigate spammer exploitations. [...]
That's one way to describe what Barracuda does. Another way to put it is that in my experience, Barracuda is pretty quick to list any IP address that has even a relatively brief burst of outgoing spam, regardless of the long term spam-to-ham ratio of that IP address. Or to put it another way, whenever we have one of our rare outgoing spam incidents, we can count on the outgoing IP involved to get listed and for some amount of our entirely legitimate email to start bouncing as a result.
As a result I expect that any attempt to use it in our anti-spam system would have far too high a false positive rate to be acceptable to our users. Given this I haven't attempted any sort of actual analysis of comparing sender IPs of accepted and rejected email against the Barracuda list; it's too much work for too little return.
My suspicion is that this is likely to be strongly influenced by your overall rate of ham to spam, for standard mathematical reasons. If most of your incoming email is spam anyways and you don't often receive email from places that are likely to be compromised from time to time by spammers, its misfires are not likely to matter to you. This does not describe our mail environment, however, either in ham/spam levels or in the type of sources we see.
(To put it one way, universities are reasonably likely to get one of their email systems compromised from time to time and we certainly get plenty of legitimate email from universities.)
On my personal sinkhole spamtrap, I could probably use the Barracuda list (and the psky RBL) as a decent way of getting rid of known and thus probably uninteresting source of spam in favour of only having to deal with (more) interesting ones. But obviously this spamtrap gets only spam, so false positives are not exactly a concern. Certainly a significant number of recently trapped messages there are from IPs that are on one or the other lists (and sometimes both), although obviously I'm taking a post-facto look at the hit rate.
2016-05-19
Some basic data on the hit rate of the Spamhaus DBL here
After my previous exploration of the Spamhaus DBL, I wound up adding it as another DNS blocklist in our overall spam filtering setup. Because we don't have a mandate for it, none of our DNS blocklists apply to all email, only to email for people who have opted in to some amount of server side spam filtering. Because the DBL applies on a per-recipient basis, the comparison I'm going to use here is against the overall recipient count (not the overall message count). I'm also going to use the past nine days, so I can sort of compare this to my estimated hit rate.
So, over the past nine days, we have had:
- 106,837 accepted
MAIL FROMs and 106,835 acceptedRCPT TOs, which means that almost all of our accepted messages have been delivered to a single destination address. - 29,194 accepted
RCPT TOs for IPs listed in one of the Spamhaus DNSBLs. Since these were accepted, these are recipients who have not opted into any amount of our server-side spam filtering. - 7,685 accepted
RCPT TOs for domains listed in the DBL. A quick check suggests that about 6,390 of these came from IP addresses that were in the Spamhaus DNSBLs. - 13,020
RCPT TOs that were rejected because the sender IP was in one of the Spamhaus DNSBLs. This is checked before the DBL. - Only 346
RCPT TOs that were rejected because the sender domain was in the DBL.
On the one hand, this doesn't look too great for the DBL; despite my initial estimate, we aren't getting many rejections from checking the DBL. On the other hand, when I look at the source addresses of those rejections, something jumps out right away: just over half of them come from one system.
Specifically, over half of them come from the mail server for another (sub)domain on campus, one where a number of our users have accounts and forward (all of) their email from that system to us. What we've effectively done with the DBL is to add an additional SMTP-time defense to reject forwarded spam. In fact there are a number of 'forwarded from another campus mail system' DBL rejections in the past nine days from other sources.
My personal view is that these rejections are valuable ones (partly because I've observed our commercial anti-spam system not doing so well with forwarded spam in the past). So on the whole I'm happy with what the DBL is doing here, and also happy that now I have better numbers on what it could be doing if more people opted in to server-side spam filtering.
(Despite my bright words here, I'm also disappointed that adding
the DBL isn't rejecting more messages. I guess this is partly down
to how a lot of spam with DBL domains comes from IPs that are already
blocked on their own. Note that we're using the DBL in its most
basic and limited mode, where we check it against the MAIL FROM
domain; you're really supposed to use it to check domains mentioned
in the body of email messages.)