2011-04-27
Mail rejection stats for our external mail gateway
In my recent spam filtering stats, I noted that some spam was rejected before it made it to the spam tagging and filtering system. Well, here's some stats on roughly that; specifically, on how much email our external mail gateway rejects at SMTP time for various reasons. The numbers here are for almost the same seven day time period as the previous stats; there is about a six and a half hour difference in coverage due to when the two systems roll their logs (one does it at midnight, one does it at 6:30am or so).
So, over seven days we:
- accepted 90,511 email messages in total
- rejected 5,798
MAIL FROMs, 2,690 for having unresolvable domains and 3,108 for being from our domain but having unknown local users. - rejected 24,876
RCPT TOs, for all sorts of reasons:- 13,393 unknown local usernames.
- 8,350 sender IPs that were in DNS blocklists; 6,496 were in the CBL (which we check first) and 1,854 were in Spamhaus Zen.
- 2,237 relay attempts; to my surprise, these appear to be real and serious attempts.
- 778 attempts to mail addresses that don't accept outside email.
- 117 attempts to send mail to obsolete domains that we explicitly block.
- 1 attempt by a persistent source that we have specifically blocked from mailing their marketing materials to our NOC address (and they've kept trying for years despite that).
The two surprises that stand out in this are how frequently spammers
attempt to forge email as from our own domains and how many relay
attempts there are. I'm not terribly surprised that unresolvable MAIL
FROM domains are relatively uncommon; as I've said before, spammers
are smart enough to notice what doesn't work
and unresolvable MAIL FROMs haven't worked for a long time.
I'm not going to try to estimate the additional 'real' spam volume here,
because in part it depends on your assumptions. For example, should we
consider all email rejected due to unresolvable MAIL FROM domains as
spam? Probably some of them are simply incompetent but real domains,
and only some of them are spammers that are either making up domains or
having their domains canceled out from underneath them.
(General information on our spam filtering is in CSLabSpamFiltering. While that was written in 2007, almost nothing has changed since then in our setup although I'm sure that the Sophos PureMessage people have been evolving it madly. Such is one of the benefits of outsourcing most of your anti-spam system.)
Some notes on what __dictoffset__ on types means in CPython
I mentioned __dictoffset__ in passing in HowSlotsWorkI. Today I feel
like expanding on that passing mention with some notes. All of this is
specific to CPython.
As mentioned in passing in HowSlotsWorkI, the __dictoffset__
attribute tells you the offset to where you find the pointer to the
__dict__ object in any instance object that has one. It is in
bytes. A positive value is an offset from the start of the object; a
negative value is an offset from the end of the object, and is used only
for classes derived from types (such as str and long) that have a
variable-sized component. A __dictoffset__ value of zero means that
the type (or class) doesn't have a __dict__ attribute.
(You can tell which types have a variable-sized component by looking at
their __itemsize__ attribute; zero means that they don't have such a
component.)
As sort of discussed in the sidebar in HowSlotsWorkI, if you inherit
from something with a zero __dictoffset__ your subclass will
normally have a non-zero __dictoffset__ and the pointer to the
__dict__ object will be glued on the end of the C-level blob of your
basic type.
Most built-in types have a __dictoffset__ of zero, as you'd
expect. However, a few types have a non-zero __dictoffset__;
the ones I know of are exceptions, functions, modules, and type
itself. What is going on is that all of these types already have to
have some sort of dictionary for their contents, along with a
pointer to this dictionary in their basic C-level blob. So they
reuse this pointer (and associated dictionary) as their __dict__,
by pointing __dictoffset__ directly to this internal field. One
consequence of this is that subclasses of these classes always have a
__dict__, even if your subclass uses __slots__.
(In general, once a class has a non-zero __dictoffset__ all of
its subclasses will always have a __dict__. I think that you can
sometimes still save space and allocations by using __slots__, but
you don't get any of the other features of __slots__ that people
are sometimes unwisely attracted to.)