What broad hit rate the Spamhaus DBL might get for us

March 20, 2016

I took the past 9 days worth of logs from our commercial anti-spam black box, extracted the 'spam score' it assigns and the envelope sender domain, split this into three categories based on the broad scores from 0 to 100 that the system assigns, and then checked all of those origin domains against the Spamhaus DBL.

(Because of how our overall anti-spam systems work, this excludes some but not all of the email from hosts that are in Spamhaus's IP based lists.)

Based partly on previous stats and how we use the spam scores ourselves, my three categories were 'definitely spam' (scores of 98 to 100), 'enough to be spam' (scores of 60 through 97), and 'probably not spam' (below 60). The raw numbers are:

  • for 'definitely spam', 5,452 different MAIL FROM domains and only 812 in the DBL; a 14% hit rate.

  • for 'enough to be spam', 4,118 different domains and 1,744 in the DBL; a 42% hit rate.

  • for 'probably not spam', 5,268 different domains and 20 in the DBL.

At one level, this is actually reassuring; it suggests that our commercial black box is doing a reasonably good job of finding much of the actual spam, even though it missed some things.

(It also suggests that the black box is not already including the DBL, or at least if it does it doesn't weigh the envelope sender very high in its scoring. Otherwise those 20 domains wouldn't be there.)

The relatively low domain hit rate on the 'definitely spam' category is at least partly due to the fact that there are a lot of domains in that set that were not used for very many messages to us. In fact the median usage count for domains there is one. If I go through the effort to count DBL hits by usage, it comes out to 44% of the actual messages had sender domains in the DBL.

The usage based hit count for the 'enough to be spam' category comes out to be slightly higher; there 54% of the actual messages had sender domains in the DBL.

(As you might expect, the 'probably not spam' category doesn't improve when measured by actual usage. Percentage wise it goes way, way down, in fact, as not very many messages came from those DBL-listed domains.)

All of this means that I should definitely look at using the DBL in our overall anti-spam setup, because using the DBL would enable early rejection of a significant amount of spam that otherwise makes it as far as relatively expensive spam scoring.


Comments on this page:

By David at 2016-03-25 13:39:28:

Arguably domain lists like the DBL are the future of anti-spam blocking. At some indefinite point in the future the Internet will tip from predominately IPv4 to IPv6, and when that happens traditional addressed-based DNSBLs will obsolete overnight. Only viable solution is for mail relays to refuse email from IPs lacking matching reverse- and forward-DNS and hard-SPF authentication. While domains can be cheap, they will never be free in the practically infinite manner of IPv6 addresses. At that time Spamhaus DBL and other lists like it will take over for address-based lists.

A couple of years back Google decided briefly to force all IPv6 MTAs to possess matching forward-and-reverse-DNS. This was brilliant since at that time IPv6 SMTP was exceedingly rare and Google through shear size was in the position to force de facto adoption of the convention. I was greatly disappointed to later discover they backed off on the requirement.

Written on 20 March 2016.
« The Spamhaus DBL does get hits even with basic checks
When you want non-mutating methods in Go »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Mar 20 03:00:25 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.