The TLDs of sender addresses for a week of our spam (June 2017 edition)

June 29, 2017

Once upon a time the Internet only had a few non-country top level domain names. Then that changed. Mostly these new TLDs get used for websites, but every so of people use them for email. Generally the stereotype is that it's mostly spammers using these new TLDs, so I thought it would be interesting to look at eight days worth of logs from our commercial anti-spam system to see what the TLDs of sender addresses looked like for messages that were scored as spam and messages that weren't.

So here are the top ten TLDs from email scored as spam, with the percentage of our spam-scored email that had a sender address in that TLD and what percent of the TLD's overall email the spam represents.

TLD % of total spam spam as % of TLD
.com 51% 60%
.us 18% 98%
.net 6% 54%
.bid 6% 100%
.ca 2% 22%
.org 2% 24%
.info 2% 96%
.it 1.5% 79%
.cn 1% 92%
.uk 1% 67%

We can immediately see that .bid does terribly and .us is not doing so well. The .bid spam comes from multiple domains and probably multiple spammers (there are at least two or three patterns in how the sender addresses are formed). .info is close to as bad as .us, but it's a much smaller percentage of the email. The .us spam seems to be a mix of compromised .us accounts, random domains, and active spammer domains. The .info spam is multiple domains but might be mostly one spammer.

The high popularity of .com in spam sender addresses surprises me, as does how much of .com email is spam. Bear in mind that we're a university department (and in Canada), so we probably exchange much less normal email with .com places than most organizations.

However, the new TLDs are not particularly popular with spammers. Even if I look all the way down in the data, it's dominated by country codes with only a few new TLDs in small quantity:

new TLD % of spam
.top 0.7%
.press 0.2%
.win 0.15%
.party 0.095%
.vip 0.08%
.men -
.club -

You get the idea. I haven't shown 'spam as a percentage of the TLD's email' here because it's mostly 100% and the times when it's not, it may be because of mis-scoring (the absolute numbers are very small, so it doesn't need much mis-scoring to show up as an appreciable percentage; .party is under a hundred messages over the eight days of logs). Interestingly, .biz sender addresses are only 79% spam as scored by our system.

Pleasingly, there were exactly 200 different TLDs used in the logs (or 199 if you exclude the null sender, which was 0.3% of the spam and 56% spam).

Written on 29 June 2017.
« I don't think you should increase ZFS on Linux's write buffering
Why big Exim queues are a problem for us in practice »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jun 29 00:05:48 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.