How much spam is forged as being from who it's sent to?

February 26, 2012

After doing the stats for the most popular sender domains for spam and discovering that the most popular thing was to use our domains, I was left with a very related question: how much spam is forged to come from the victim themselves?

As near as I can tell, the answer is almost all of the spam that's forged as from our domains is in fact forged as coming from the victim themselves (or, for multi-recipient messages, as coming from the first recipient). Based on our current set of 45 days of logfiles, that's about 8.3% of all messages that got spam-tagged. I suppose that this makes sense; after all, there's no need to take the risk of making up addresses on the remote system when you already have some, ie the ones you're sending spam to.

(As before, I checked only high-rated spam.)

The obvious corollary question to ask is how many non-spam messages match this criteria. The answer appears to be that almost none do, which is not really surprising. Given ad-hoc mailing lists and the like, it's possible for legitimate email to loop around in this way or for people to copy themselves when they're sending email through an outside SMTP server, but it's probably not going to be very common in most user populations.

For a while, I've believed that spammers like forging system addresses, especially postmaster. This turns out to be wrong; vanishingly little (high-scoring) spam is sent as from anyone's postmaster, and none is forged as from our postmaster address. Virus spammers may do that, but viruses are still very rare in our mail stream. I admit that this surprises me.

(Working with the logfiles for our spam filtering and tagging system has shown me that I need a specialized matching and extracting program that works with log lines of the form 'key=value key=value key=value ...', especially with some keys repeated several times. Awk is not a really good fit for these files. Creative use of tr can help when I only want a single field, but things fall down when I want several.)

Written on 26 February 2012.
« The (future) problem with Python 2.7
What information I want out of ZFS tools and libraries »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 26 01:45:49 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.