When 'simple' DNS blocklists work well for you

July 27, 2016

I've written about how we can divide DNS blocklists into 'simple' and 'complex' ones, where simple DNSBLs basically list things based on them sending spam or other bad stuff without trying to do more complex things like assess how much legitimate traffic also comes from the source. To put it one way, if a DNSBL lists one of GMail's outgoing SMTP servers because it sent some spam, it's almost certainly a simple one. I also said that rejecting email based on a simple DNSBL isn't necessarily a mistake, so it's time to explain that.

Suppose that you have a mail system that generally receives a low volume of legitimate email; for example, you might be operating a personal email server. Suppose that you also start getting spam. Spammers almost never go away, so your spam volume is very likely to trend up over time and reach a point where most of your incoming email is spam. In this environment, a listing in a simple DNSBL is a fairly strong confirmation signal that this new email is really spam. It's much likely that you're getting spam email from an IP that's been detected as spamming than that an innocent person has chosen to send you legitimate email from an IP that also sent spam and got listed in the DNSBL. The latter could happen, but the odds are low.

We've sort of seen this before. If the legitimate email rate is low and the DNSBL's 'false positive' rate on it is also low, the odds that a positive signal from the DNSBL means that an email is spam is very high. You can make the odds even higher by whitelisting known good sources.

(Of course anti-spam precautions aren't evaluated purely on percentages; the absolute number of legitimate messages blocked matters. Here the low volume helps, as there just aren't that many legitimate emails to get blocked.)

Similar logic can be applied to a lot of anti-spam heuristics; many things look good when they're dealing with a stream of email that's mostly or almost entirely spam. Block on bad EHLO greetings? Sure, why not, especially since GMail and the other big people do generally get those things right.

(GMail will send you spam too, of course, but statistically a new legitimate sender is much more likely to be using GMail or one of the other big places than an email server in the middle of nowhere. And yes, there are downsides to too many people adopting this sort of attitude to both heuristics and new mail sending machines in surprising places; ask anyone trying to send personal email from a new small home mail server and get it accepted by places.)

Written on 27 July 2016.
« An irritating systemd behavior when you tell it to reboot the system
A bit about what we use DTrace for (and when) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 27 01:09:35 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.