There are (at least) two sorts of DNS blocklists
Here is a trite and obvious thing that I never the less feel like writing down: in practice, there are (at least) two sorts of DNS anti-spam blocklists. Since I want to use value neutral terms here, let us call these 'simple' and 'complex' blocklists.
The operation of a simple DNSBL is, well, simple. If it sees spam from an IP, it lists the IP (or if it sees whatever is the DNSBL's idea of 'bad stuff'). Usually the IP gets automatically delisted after a while, but in some DNSBLs the listing lasts forever unless someone takes action to have it get cleared, appealed, or whatever.
A complex DNSBL attempts to have a more complex balancing criteria for adding listings than simple presence of spam; for instance, it may somehow assess how much apparently legitimate traffic it's seen from the source IP as well as spam volume. A complex DNSBL is sometimes going to be slower to list an IP than a simple DNSBL.
A simple DNSBL does not have 'false positives' as such (assuming that it's honestly run), but that's because a listing means something very narrow; it means that the IP did a bad thing within the time horizon. People who reject email based on a listing in a simple DNSBL may have false positives in that rejection, though, because an IP doing a bad thing once doesn't necessarily mean that it will do it every time. Complex DNSBLs can have false positives because they're fundamentally intended to assert that the email you're getting is probably bad. Good operators of complex DNSBLs attempt to minimize such false positives.
To give an example of each, the Spamhaus SBL is a complex DNSBL (or at least generally it is). The CBL is a simple DNSBL, but one that (theoretically) uses a very narrow listing criteria that is very strongly correlated with sending only spam.
Unfortunately not all DNSBLs make it clear what sort of DNSBL they are in their description (or sometimes they wave their hands about it a bit). At least at the moment, one quite strong signal that you are dealing with a simple DNSBL is if it ever lists one of GMail's outgoing mail servers.
(I feel that rejecting email based on a simple DNSBL is not necessarily a mistake, but the sidebar attempting to explain this got long enough that it's going to be another entry.)
Some notes on adding exposed statistics to a (Go) program
As a slow little project for the past while, I have been adding some accessible statistics to my sinkhole SMTP server, using Go's expvar package. This has resulted in me learning lessons both about expvar in specific and the process of adding statistics in general.
My big learning experience is going to sound fairly obvious and trite: I only really figured out what statistics I wanted to expose through experimentation. I started out with the idea that counting some obvious things would be interesting (and to a certain extent they were), but I created many of the lot of stats by a process of looking at the current set and realizing that there was information I wanted to know or questions that I wanted answered that were not covered by existing things I was exposing. Sometimes trying to use the initial version of statistic showed me that it was too broad or needed some additional information in order to be useful.
The corollary to this is that what statistics you'll want depends in large part on what questions are interesting and informative for you, which depends on how you're using the program. A lot of my stats are focused on anti-spam related issues, because that's how I'm using my sinkhole SMTP server. Someone using it to collect email from a collection of nodes and tests might well want a significantly different set of statistics. This does make adding stats to a theoretically general program a somewhat tricky thing; I have no good answers to this currently.
(I have not tried to be particularly general in my current set of stats. Since this has been an experiment to play around with the idea, I've focused on making them interesting to me.)
Just exporting statistics from a program is less general than pushing
events and metrics into a full time series based metrics system,
expvar package and a few other tools like
jq makes it
much easier to do the former (for a start, I don't need a metrics
system). Exporting statistics is also not as comprehensive as having
an event log or the like. Since I do sort of have an event log,
I've chosen to view my
expvar stats as being an on-demand summary
of it, one that I can look at without having to actively parse the
log to count things up.
And on another obvious note, putting counters and so on in a hierarchical namespace is quite helpful for keeping things comprehensible and organized. To some extent a good hierarchy can substitute for not being able to come up with great names for individual statistics. And sometimes you have data with unpredictable names that has to be confined to a namespace.
(For instance, I track DNS blocklist hit counts. The names of DNS
blocklists are essentially arbitrary, so I put the whole set of
stats into a
dnsbl_hits namespace. And because the expvar
package automatically publishes some general Go stats on things
like your program's memory usage, I put all of my stats under a
top-level name so it's easy to pick them out.)