Wandering Thoughts archives

2012-02-26

How much spam is forged as being from who it's sent to?

After doing the stats for the most popular sender domains for spam and discovering that the most popular thing was to use our domains, I was left with a very related question: how much spam is forged to come from the victim themselves?

As near as I can tell, the answer is almost all of the spam that's forged as from our domains is in fact forged as coming from the victim themselves (or, for multi-recipient messages, as coming from the first recipient). Based on our current set of 45 days of logfiles, that's about 8.3% of all messages that got spam-tagged. I suppose that this makes sense; after all, there's no need to take the risk of making up addresses on the remote system when you already have some, ie the ones you're sending spam to.

(As before, I checked only high-rated spam.)

The obvious corollary question to ask is how many non-spam messages match this criteria. The answer appears to be that almost none do, which is not really surprising. Given ad-hoc mailing lists and the like, it's possible for legitimate email to loop around in this way or for people to copy themselves when they're sending email through an outside SMTP server, but it's probably not going to be very common in most user populations.

For a while, I've believed that spammers like forging system addresses, especially postmaster. This turns out to be wrong; vanishingly little (high-scoring) spam is sent as from anyone's postmaster, and none is forged as from our postmaster address. Virus spammers may do that, but viruses are still very rare in our mail stream. I admit that this surprises me.

(Working with the logfiles for our spam filtering and tagging system has shown me that I need a specialized matching and extracting program that works with log lines of the form 'key=value key=value key=value ...', especially with some keys repeated several times. Awk is not a really good fit for these files. Creative use of tr can help when I only want a single field, but things fall down when I want several.)

ForgedFromSelf-2012-02-26 written at 01:45:49; Add Comment

2012-02-18

The most popular sender domains for spam messages sent to here

Every so often I get curious about crazy spam-related statistics. Today's curiosity started out as a simple question: given that spammers generally forge the original addresses on their messages, do they like picking on some domains or do they distribute them randomly around? As it happens, identifying messages that have forged senders is a little bit too much work for a blog entry, so I am answering the closely related question of what are the most popular domains to appear as the sending domain on spam.

My data comes from the last 45 days of our spam tagging and filtering system. This system assigns messages a spam score; based on the analysis of the score distributions from back here, I decided to look only at messages that scored between 90 and 100 points. Over the past 45 days it turns out that there were just over 300,000 such messages.

The top sender domains for these messages break down as follows:

our own domains 27200+
yahoo.com 27000
yahoo.co.jp 17800
gmail.com 14000
bbb.org 7200
nacha.org 6500
ymail.com 6300
returns.groups.yahoo.com 4600
advertise-bz.cn 3500

In terms of top level domains, it shouldn't surprise anyone that .com is by far the most forged, followed by .jp, .net, .org, and then .cn.

Before I did these numbers, I probably wouldn't have predicted that forging valid users on our own domains was so popular (it's almost 10% of the total high-scoring spam messages). This probably explains why my earlier rejection stats showed that we had a surprisingly high rate of sender addresses that were nonexistent local users.

Based on spot checking the distribution of origin IP addresses for these domains, most of them really are mostly forged. Unfortunately, the standout exception is Yahoo Groups; almost all of those messages really do come from Yahoo's mail servers. It appears that spammers have probably infested Yahoo Groups, much like they seem to have done so on Google Groups.

The other exception is advertise-bz.cn. Messages claiming to be from it appear to be emitted from only a narrow set of IP address ranges in China. I spot-checked the destination addresses here and they don't appear to just be repeatedly spamming only a few unlucky people. Some investigation shows that this is actually a ROKSO-listed spammer with several SBL listings; given the SBL listings, this spam source is also having some amount of their email rejected outright at SMTP time.

MostAbusedDomains-2012-02-18 written at 23:57:57; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.