Wandering Thoughts archives

2011-04-27

Mail rejection stats for our external mail gateway

In my recent spam filtering stats, I noted that some spam was rejected before it made it to the spam tagging and filtering system. Well, here's some stats on roughly that; specifically, on how much email our external mail gateway rejects at SMTP time for various reasons. The numbers here are for almost the same seven day time period as the previous stats; there is about a six and a half hour difference in coverage due to when the two systems roll their logs (one does it at midnight, one does it at 6:30am or so).

So, over seven days we:

  • accepted 90,511 email messages in total
  • rejected 5,798 MAIL FROMs, 2,690 for having unresolvable domains and 3,108 for being from our domain but having unknown local users.
  • rejected 24,876 RCPT TOs, for all sorts of reasons:
    • 13,393 unknown local usernames.
    • 8,350 sender IPs that were in DNS blocklists; 6,496 were in the CBL (which we check first) and 1,854 were in Spamhaus Zen.
    • 2,237 relay attempts; to my surprise, these appear to be real and serious attempts.
    • 778 attempts to mail addresses that don't accept outside email.
    • 117 attempts to send mail to obsolete domains that we explicitly block.
    • 1 attempt by a persistent source that we have specifically blocked from mailing their marketing materials to our NOC address (and they've kept trying for years despite that).

The two surprises that stand out in this are how frequently spammers attempt to forge email as from our own domains and how many relay attempts there are. I'm not terribly surprised that unresolvable MAIL FROM domains are relatively uncommon; as I've said before, spammers are smart enough to notice what doesn't work and unresolvable MAIL FROMs haven't worked for a long time.

I'm not going to try to estimate the additional 'real' spam volume here, because in part it depends on your assumptions. For example, should we consider all email rejected due to unresolvable MAIL FROM domains as spam? Probably some of them are simply incompetent but real domains, and only some of them are spammers that are either making up domains or having their domains canceled out from underneath them.

(General information on our spam filtering is in CSLabSpamFiltering. While that was written in 2007, almost nothing has changed since then in our setup although I'm sure that the Sophos PureMessage people have been evolving it madly. Such is one of the benefits of outsourcing most of your anti-spam system.)

spam/CSLabRejectionStats-2011-04-26 written at 23:54:48; Add Comment

Some notes on what __dictoffset__ on types means in CPython

I mentioned __dictoffset__ in passing in HowSlotsWorkI. Today I feel like expanding on that passing mention with some notes. All of this is specific to CPython.

As mentioned in passing in HowSlotsWorkI, the __dictoffset__ attribute tells you the offset to where you find the pointer to the __dict__ object in any instance object that has one. It is in bytes. A positive value is an offset from the start of the object; a negative value is an offset from the end of the object, and is used only for classes derived from types (such as str and long) that have a variable-sized component. A __dictoffset__ value of zero means that the type (or class) doesn't have a __dict__ attribute.

(You can tell which types have a variable-sized component by looking at their __itemsize__ attribute; zero means that they don't have such a component.)

As sort of discussed in the sidebar in HowSlotsWorkI, if you inherit from something with a zero __dictoffset__ your subclass will normally have a non-zero __dictoffset__ and the pointer to the __dict__ object will be glued on the end of the C-level blob of your basic type.

Most built-in types have a __dictoffset__ of zero, as you'd expect. However, a few types have a non-zero __dictoffset__; the ones I know of are exceptions, functions, modules, and type itself. What is going on is that all of these types already have to have some sort of dictionary for their contents, along with a pointer to this dictionary in their basic C-level blob. So they reuse this pointer (and associated dictionary) as their __dict__, by pointing __dictoffset__ directly to this internal field. One consequence of this is that subclasses of these classes always have a __dict__, even if your subclass uses __slots__.

(In general, once a class has a non-zero __dictoffset__ all of its subclasses will always have a __dict__. I think that you can sometimes still save space and allocations by using __slots__, but you don't get any of the other features of __slots__ that people are sometimes unwisely attracted to.)

python/DictoffsetNotes written at 01:27:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.