2013-06-30
Some very basic DNS blocklist hit information for the last 30 days
Our inbound mail gateway anti-spam stuff logs when a connection is from something listed in the CBL or in zen.spamhaus.org (and yes, we know that that's sort of redundant, it's a long story). Because of how it's implemented, we only check zen.spamhaus.org if we don't find the IP in the CBL.
(It turns out that the log message I'm looking at only fires when we
accept an RCPT TO from such an IP address and I think it may fire
multiple times for multiple RCPT TOs. This makes me think that I need
better logging, although I've already seen that spam filter stats can
be complicated.)
Over the last 30 days, we accepted RCPT TOs from 90,000 different IP
addresses that were in one or the other (some were detected as being in
both at different times). The CBL is the dominant source, at 77,000 or
so; Zen is good for another 15,000 or so. I also have stats for RCPT
TOs that we rejected due to the source IP being in one of the DNS
blocklists; over the same 30 day period we rejected 13,500 different
IPs (for a total of 92,000 rejected RCPT TOs), again almost all from
specifically due to a CBL listing (12,000 to 1,500). Roughly 8,500 of
these IPs also had some RCPT TOs accepted.
(For scale on the RCPT TO rejections, over the same time period we
fully accepted somewhere around 540,000 RCPT TOs (counting email that
got all the way to the end of DATA).)
Generating ad-hoc stats like this makes me think that I should work out what stats are interesting in advance and then make sure that we're logging enough information to reconstruct them. Maybe I should also put together scripts to generate stats automatically on demand (which would mean that I might look at them more).
(The advanced version is having logstash or some equivalent digest all of the logs and provide real-time versions of the stats. But while that might look pretty, it's not really useful; there is nothing actionable in these stats (to use the jargon), just things of vague interest.)
2013-06-27
How much of our incoming email is checked at SMTP DATA time
One of our anti-spam steps is to check some
messages for signs of spam at SMTP DATA time. To qualify for checking,
a message must have only (accepted) RCPT TOs of people who've opted
in to enough checking to make this worthwhile. I have previously
done figures on how many recipients each average inbound email has, but I haven't looked directly at how much of
a workout this DATA time check is getting.
Over the past 30 days we've accepted 487,000 messages and run 49,000
through SMTP DATA checks. Over roughly the same amount of time we
rejected about 21,000 of those checked messages; about 190 of those
rejections were detected as 'viruses' (which includes some phishing
attempts because that's how the commercial filtering system we use
works).
At first I was all set to be depressed about this low ratio of email
checking. Then I actually looked at how many email addresses had opted
in to some degree of DATA time filtering and, well, it's tiny. We
have about 300 local addresses enrolled in this checking, while over
the same past 30 days we've had messages sent to about 1700 different
local addresses. It turns out that less than 120 local addresses have
rejected any spam at SMTP DATA time over the past 30 days and thus are
responsible for those 21,000 rejections.
(As you might guess, a few heavily spammed local addresses are disproportionately responsible for rejections. The most spammed address rejected over 30% of the messages, although after that the remaining very active addresses drop to the 5% level.)
Since I just generated the stats to check my work: it looks like only
somewhat less than half of those enrolled addresses actually had email
sent to them that went through SMTP DATA checks. If my crude log
crunching is accurate there are only about 25 local addresses that did
SMTP DATA checks but did not reject any spam at DATA time. I guess
this makes sense; if our users bother to go out of their way to enroll
themselves in this, it's because they need it.
(This does imply that the enrolled users are not getting a significantly
disproportionate amount of our incoming email. About 8.5% of the
destination addresses are enrolled and about 10% of the incoming email
gets checked at DATA time; this is a bit higher than a completely fair
distribution but not that much off for crude measurements.)