Wandering Thoughts archives

2012-02-18

The most popular sender domains for spam messages sent to here

Every so often I get curious about crazy spam-related statistics. Today's curiosity started out as a simple question: given that spammers generally forge the original addresses on their messages, do they like picking on some domains or do they distribute them randomly around? As it happens, identifying messages that have forged senders is a little bit too much work for a blog entry, so I am answering the closely related question of what are the most popular domains to appear as the sending domain on spam.

My data comes from the last 45 days of our spam tagging and filtering system. This system assigns messages a spam score; based on the analysis of the score distributions from back here, I decided to look only at messages that scored between 90 and 100 points. Over the past 45 days it turns out that there were just over 300,000 such messages.

The top sender domains for these messages break down as follows:

our own domains 27200+
yahoo.com 27000
yahoo.co.jp 17800
gmail.com 14000
bbb.org 7200
nacha.org 6500
ymail.com 6300
returns.groups.yahoo.com 4600
advertise-bz.cn 3500

In terms of top level domains, it shouldn't surprise anyone that .com is by far the most forged, followed by .jp, .net, .org, and then .cn.

Before I did these numbers, I probably wouldn't have predicted that forging valid users on our own domains was so popular (it's almost 10% of the total high-scoring spam messages). This probably explains why my earlier rejection stats showed that we had a surprisingly high rate of sender addresses that were nonexistent local users.

Based on spot checking the distribution of origin IP addresses for these domains, most of them really are mostly forged. Unfortunately, the standout exception is Yahoo Groups; almost all of those messages really do come from Yahoo's mail servers. It appears that spammers have probably infested Yahoo Groups, much like they seem to have done so on Google Groups.

The other exception is advertise-bz.cn. Messages claiming to be from it appear to be emitted from only a narrow set of IP address ranges in China. I spot-checked the destination addresses here and they don't appear to just be repeatedly spamming only a few unlucky people. Some investigation shows that this is actually a ROKSO-listed spammer with several SBL listings; given the SBL listings, this spam source is also having some amount of their email rejected outright at SMTP time.

spam/MostAbusedDomains-2012-02-18 written at 23:57:57; Add Comment

The downside of automation versus the death of system administration

Back in AutomationDownside I discussed how one downside of automation was that either you had to spend time learning all of the extra layers it introduced or you'd become a push-button monkey. There's a consequence of this that I didn't mention back in the entry.

This push-button monkey status is the silent downside of the future death of system administration that I sketched out recently. All of those sysadmin-less developers doing their own deployments from canned recipes aren't going to know what's really going on in all of the layers if something goes wrong. This is fine as long as everything works, but when things go off the rails, well, you have issues.

(This is not just an issue of plain lack of knowledge, either, or to put it another way the lack of knowledge is a feature. One point of this is to save the developers from having to spend the time to learn all of the specialized knowledge that's needed to understand the full stack.)

I wouldn't count on this to save your regular sysadmin job, though. If this future comes to pass, things are going to work most of the time and most of the time when they don't work the developers are going to be able to figure it out on their own fast enough (even if it's not as fast as a sysadmin would). Many fewer places are going to be big enough that things are going wrong so frequently that a full-time 'sysadmin' who understands the full deployment stack makes sense. Especially in the constrained environment of a small company, people will make do and if things blow up every so often that's okay as long as they don't blow up too badly.

(You might question the idea that canned automation will work right most of the time, but I think that it will in specific environments such as deploying to a given cloud setup. And to a large extent the degree that my sketch of the death of system administration comes to pass depends on how routinely reliable such pre-written recipes are.)

Traditional sysadmins will probably be horrified at the mistakes that will result from people not knowing all of the fine details and charging ahead anyways. But on a pragmatic level most of the resulting problems won't and don't matter very much over the long run (although they'll be awkward and embarrassing at the time, just as they are today for the companies that run into them). Especially in a future where automation mostly works, you'll need a real long tail event to seriously damage an otherwise sound company.

(Perhaps people should care more about the possibility of long tail events. But it's a hard argument to make, especially when a company is having to choose between a sysadmin to alleviate a rare risk and another developer to accelerate their growth.)

(I have more thoughts on this area circling in my head, but trying to write some of them down has made it clear that they're not clear yet.)

Sidebar: clarifying what I mean by the push-button monkey stuff

Taken from a comment on AutomationDownside:

Or you could be in a situation where all you need to know to configure Apache is Apache configuration, but just do it at this particular host/path, and the changes will be pushed out to the web server/s in question.

This is exactly the situation where you've been reduced to a push-button monkey. You don't actually understand what's going on; you just know how to achieve certain results. What turns people into push-button monkeys isn't that they don't know what to do, it's that they don't know enough about how things really work to do anything other than push the buttons. In particular, they don't know enough to troubleshoot problems except by rote.

Suppose you put a new version of the configuration into the magic host/path spot but the change you wanted isn't appearing on the web servers (or isn't appearing on some web servers). Unless you understand the automation that distributes the files, you don't know where to start looking for problems or even what problems there might be.

(Well, you might have a troubleshooting checklist that someone has prepared for you. But if it's a problem that hasn't been foreseen, you are once again up the creek.)

sysadmin/AutomationDownsideII written at 02:05:54; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.