2011-04-27
Mail rejection stats for our external mail gateway
In my recent spam filtering stats, I noted that some spam was rejected before it made it to the spam tagging and filtering system. Well, here's some stats on roughly that; specifically, on how much email our external mail gateway rejects at SMTP time for various reasons. The numbers here are for almost the same seven day time period as the previous stats; there is about a six and a half hour difference in coverage due to when the two systems roll their logs (one does it at midnight, one does it at 6:30am or so).
So, over seven days we:
- accepted 90,511 email messages in total
- rejected 5,798
MAIL FROMs, 2,690 for having unresolvable domains and 3,108 for being from our domain but having unknown local users. - rejected 24,876
RCPT TOs, for all sorts of reasons:- 13,393 unknown local usernames.
- 8,350 sender IPs that were in DNS blocklists; 6,496 were in the CBL (which we check first) and 1,854 were in Spamhaus Zen.
- 2,237 relay attempts; to my surprise, these appear to be real and serious attempts.
- 778 attempts to mail addresses that don't accept outside email.
- 117 attempts to send mail to obsolete domains that we explicitly block.
- 1 attempt by a persistent source that we have specifically blocked from mailing their marketing materials to our NOC address (and they've kept trying for years despite that).
The two surprises that stand out in this are how frequently spammers
attempt to forge email as from our own domains and how many relay
attempts there are. I'm not terribly surprised that unresolvable MAIL
FROM domains are relatively uncommon; as I've said before, spammers
are smart enough to notice what doesn't work
and unresolvable MAIL FROMs haven't worked for a long time.
I'm not going to try to estimate the additional 'real' spam volume here,
because in part it depends on your assumptions. For example, should we
consider all email rejected due to unresolvable MAIL FROM domains as
spam? Probably some of them are simply incompetent but real domains,
and only some of them are spammers that are either making up domains or
having their domains canceled out from underneath them.
(General information on our spam filtering is in CSLabSpamFiltering. While that was written in 2007, almost nothing has changed since then in our setup although I'm sure that the Sophos PureMessage people have been evolving it madly. Such is one of the benefits of outsourcing most of your anti-spam system.)
2011-04-26
A quick look at some spam filtering stats from our system
It's been a while since I thought about generating statistics about what our anti-spam systems are doing and seeing, which probably means that it's about time to do it again. I'm going to look at the past week's statistics, mostly because we upgraded the spam filtering machine recently and we don't have old logs any more. Unfortunately this is not an ideal week to look at, since Friday was a holiday here so the numbers are going to be down from usual.
First, the disclaimers: not all spam makes it to our spam tagging and
filtering system. For example, some people immediately reject email from
IP addresses that are in the Spamhaus Zen list; since this rejects at
RCPT TO time, the actual message never makes it to the spam filtering
system to be scored. At this time I haven't generated stats on how large
an effect that is.
So, over the past seven days we saw:
- 91,171 messages in total. The volume is mostly during weekdays, and
once I wave my hands about the holiday Friday I'll call it flat during
the weekdays and flat (at a lower level) on the weekend as well.
- 557 messages that were identified as having some sort of virus
payload. Apparently viruses are not very popular any more (or
at least not viruses that our system can recognize).
- 47,592 messages that scored high enough to be classified as spam by our system. I don't want to draw any conclusions about day of the week volume from the data I have so far.
This is well under the level of spam that most sources report. It's possible that our stats are skewed by various things; for example, it may be that most of the active targets of spam have opted in to spam rejection, and so spam to them never makes it to these numbers. (Trying to quantify the volume of rejections is a project for later.)
Our spam system gives messages a spam score from 0 to 100 (with some decimal points of precision allowed; theoretically this is some sort of probability measure). The breakdown of scores is somewhat interesting:
- 22,448 messages scored 100 points.
- 20,398 messages scored 90 to 99 points. Of those, 14,170 scored 99 points and 1,222 scored 98 points, so almost all of this scoring band were at the top.
- 4,131 messages scored 80 to 89 points.
- 330 messages scored 70 to 79 points.
- 285 messages scored 60 to 69 points.
- 584 messages scored 50 to 59 points, 257 scored 40 to 49 points, and 279 messages scored 30 to 39 points.
- 1,083 messages scored 20 to 29 points
- 12,899 messages scored 10 to 19 points.
- 28,477 messages scored 0 to 9 points. The lowest scoring messages had seven points and there were 17,807 of them, then 8,109 messages scoring 8 points and 2,561 messages scoring 9 points.
Our current threshold for calling something spam is 60 points or more. These numbers suggest that we could significantly raise the threshold without having a material effect on our spam filtering; on the other hand, since it would have no material effect there seems no reason to do it (other than possibly user perception, and I don't know if users pay any attention to this).
(Note that this is not the same system that I did my old spam stats for, and so if I do regular reports they are going to look different and not be comparable to the old numbers.)