Wandering Thoughts archives

2013-06-30

Some very basic DNS blocklist hit information for the last 30 days

Our inbound mail gateway anti-spam stuff logs when a connection is from something listed in the CBL or in zen.spamhaus.org (and yes, we know that that's sort of redundant, it's a long story). Because of how it's implemented, we only check zen.spamhaus.org if we don't find the IP in the CBL.

(It turns out that the log message I'm looking at only fires when we accept an RCPT TO from such an IP address and I think it may fire multiple times for multiple RCPT TOs. This makes me think that I need better logging, although I've already seen that spam filter stats can be complicated.)

Over the last 30 days, we accepted RCPT TOs from 90,000 different IP addresses that were in one or the other (some were detected as being in both at different times). The CBL is the dominant source, at 77,000 or so; Zen is good for another 15,000 or so. I also have stats for RCPT TOs that we rejected due to the source IP being in one of the DNS blocklists; over the same 30 day period we rejected 13,500 different IPs (for a total of 92,000 rejected RCPT TOs), again almost all from specifically due to a CBL listing (12,000 to 1,500). Roughly 8,500 of these IPs also had some RCPT TOs accepted.

(For scale on the RCPT TO rejections, over the same time period we fully accepted somewhere around 540,000 RCPT TOs (counting email that got all the way to the end of DATA).)

Generating ad-hoc stats like this makes me think that I should work out what stats are interesting in advance and then make sure that we're logging enough information to reconstruct them. Maybe I should also put together scripts to generate stats automatically on demand (which would mean that I might look at them more).

(The advanced version is having logstash or some equivalent digest all of the logs and provide real-time versions of the stats. But while that might look pretty, it's not really useful; there is nothing actionable in these stats (to use the jargon), just things of vague interest.)

CSLabDNSBLHits2013-06-29 written at 01:09:30; Add Comment

2013-06-27

How much of our incoming email is checked at SMTP DATA time

One of our anti-spam steps is to check some messages for signs of spam at SMTP DATA time. To qualify for checking, a message must have only (accepted) RCPT TOs of people who've opted in to enough checking to make this worthwhile. I have previously done figures on how many recipients each average inbound email has, but I haven't looked directly at how much of a workout this DATA time check is getting.

Over the past 30 days we've accepted 487,000 messages and run 49,000 through SMTP DATA checks. Over roughly the same amount of time we rejected about 21,000 of those checked messages; about 190 of those rejections were detected as 'viruses' (which includes some phishing attempts because that's how the commercial filtering system we use works).

At first I was all set to be depressed about this low ratio of email checking. Then I actually looked at how many email addresses had opted in to some degree of DATA time filtering and, well, it's tiny. We have about 300 local addresses enrolled in this checking, while over the same past 30 days we've had messages sent to about 1700 different local addresses. It turns out that less than 120 local addresses have rejected any spam at SMTP DATA time over the past 30 days and thus are responsible for those 21,000 rejections.

(As you might guess, a few heavily spammed local addresses are disproportionately responsible for rejections. The most spammed address rejected over 30% of the messages, although after that the remaining very active addresses drop to the 5% level.)

Since I just generated the stats to check my work: it looks like only somewhat less than half of those enrolled addresses actually had email sent to them that went through SMTP DATA checks. If my crude log crunching is accurate there are only about 25 local addresses that did SMTP DATA checks but did not reject any spam at DATA time. I guess this makes sense; if our users bother to go out of their way to enroll themselves in this, it's because they need it.

(This does imply that the enrolled users are not getting a significantly disproportionate amount of our incoming email. About 8.5% of the destination addresses are enrolled and about 10% of the incoming email gets checked at DATA time; this is a bit higher than a completely fair distribution but not that much off for crude measurements.)

OurMilterVolumeLevel written at 01:12:44; Add Comment

By day for June 2013: 27 30; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.