Mail rejection stats for our external mail gateway
In my recent spam filtering stats, I noted that some spam was rejected before it made it to the spam tagging and filtering system. Well, here's some stats on roughly that; specifically, on how much email our external mail gateway rejects at SMTP time for various reasons. The numbers here are for almost the same seven day time period as the previous stats; there is about a six and a half hour difference in coverage due to when the two systems roll their logs (one does it at midnight, one does it at 6:30am or so).
So, over seven days we:
- accepted 90,511 email messages in total
- rejected 5,798
MAIL FROMs, 2,690 for having unresolvable domains and 3,108 for being from our domain but having unknown local users.
- rejected 24,876
RCPT TOs, for all sorts of reasons:
- 13,393 unknown local usernames.
- 8,350 sender IPs that were in DNS blocklists; 6,496 were in the CBL (which we check first) and 1,854 were in Spamhaus Zen.
- 2,237 relay attempts; to my surprise, these appear to be real and serious attempts.
- 778 attempts to mail addresses that don't accept outside email.
- 117 attempts to send mail to obsolete domains that we explicitly block.
- 1 attempt by a persistent source that we have specifically blocked from mailing their marketing materials to our NOC address (and they've kept trying for years despite that).
The two surprises that stand out in this are how frequently spammers
attempt to forge email as from our own domains and how many relay
attempts there are. I'm not terribly surprised that unresolvable
MAIL FROMs haven't worked for a long time.
I'm not going to try to estimate the additional 'real' spam volume here,
because in part it depends on your assumptions. For example, should we
consider all email rejected due to unresolvable
MAIL FROM domains as
spam? Probably some of them are simply incompetent but real domains,
and only some of them are spammers that are either making up domains or
having their domains canceled out from underneath them.
(General information on our spam filtering is in CSLabSpamFiltering. While that was written in 2007, almost nothing has changed since then in our setup although I'm sure that the Sophos PureMessage people have been evolving it madly. Such is one of the benefits of outsourcing most of your anti-spam system.)
Some notes on what
__dictoffset__ on types means in CPython
__dictoffset__ in passing in HowSlotsWorkI. Today I feel
like expanding on that passing mention with some notes. All of this is
specific to CPython.
As mentioned in passing in HowSlotsWorkI, the
attribute tells you the offset to where you find the pointer to the
__dict__ object in any instance object that has one. It is in
bytes. A positive value is an offset from the start of the object; a
negative value is an offset from the end of the object, and is used only
for classes derived from types (such as
long) that have a
variable-sized component. A
__dictoffset__ value of zero means that
the type (or class) doesn't have a
(You can tell which types have a variable-sized component by looking at
__itemsize__ attribute; zero means that they don't have such a
As sort of discussed in the sidebar in HowSlotsWorkI, if you inherit
from something with a zero
__dictoffset__ your subclass will
normally have a non-zero
__dictoffset__ and the pointer to the
__dict__ object will be glued on the end of the C-level blob of your
Most built-in types have a
__dictoffset__ of zero, as you'd
expect. However, a few types have a non-zero
the ones I know of are exceptions, functions, modules, and
itself. What is going on is that all of these types already have to
have some sort of dictionary for their contents, along with a
pointer to this dictionary in their basic C-level blob. So they
reuse this pointer (and associated dictionary) as their
__dictoffset__ directly to this internal field. One
consequence of this is that subclasses of these classes always have a
__dict__, even if your subclass uses
(In general, once a class has a non-zero
__dictoffset__ all of
its subclasses will always have a
__dict__. I think that you can
sometimes still save space and allocations by using
you don't get any of the other features of
__slots__ that people
are sometimes unwisely attracted to.)