2005-07-31
Spam storm aftermath, July 30th 2005
The spam storm seems to have died down now and it's a Saturday night, so time for a wrapup and a look at the overall stats this week.
This week's total is 295,000 SMTP connections from at least 38,000 different IP addresses. In hindsight, the spam storm was probably already dying on the 26th, since we got only about another 60,000 SMTP connections since then (which is more or less average). While our logs show some more hits characteristic of the spammer after that point, the volume steadily decreased over the rest of the week.
Kernel level filtering:
Host/Mask Packets Bytes 212.216.176.0/24 7193 391K 221.216.0.0/13 4058 195K 65.214.61.100 3768 181K 85.92.129.231 3565 214K 62.221.254.34 3143 189K 219.128.0.0/12 3119 156K 61.128.0.0/10 2933 148K 220.160.0.0/11 2709 136K 66.49.190.112 2392 143K 170.206.225.64 2309 111K
Interesting, this week sees far fewer individual IP addresses in the top 10 and more (large) netblocks. The counts are also up, so I suspect that a lot of zombies in those netblocks were trying to hammer on us.
Stats on SMTP connection time rejections:
25376 total
13070 dynamic IP
8509 bad or no reverse DNS
1853 class bl-cbl
525 class bl-sbl
345 class bl-spews
252 class bl-sdul
232 class bl-dsbl
228 class bl-njabl
63 class bl-ordb
24 class bl-opm
The SBL hits are way up, but I believe mostly because a few SBL listed spam sources decided to hammer on us this week (with the big winner being SBL24651 at almost a hundred attempts between two IP addresses). Unsurprisingly the SORBS DUL is up, since a lot of zombies are going to be dynamic IP addresses and hopefully listed there.
We saw successful SMTP connections from only 1227 different IP addresses, and actual mail delivery from only 189 different IP addresses, again the usual pathetic ratios. (Spam, spam, oh glorious spam. Please die now.)
Our volume of bad HELOs and people sending us bounces to nonexistent local users is down. (I'm not going to try to generate systematic numbers.)
2005-07-29
How spammers seem to be coping with greylisting
I have a machine (my Debian Woody machine) that has far less aggressive antispam defenses than anything else (as a result of an old and incapable mailer that is the Debian Woody default). As a result, I get to see an interesting view of some current spammer methods, more or less live and unfiltered.
One of the interesting things is that when email addresses on this machine get spammed, they usually get several copies of the same message, all from the same origin address and the same machine.
My current theory is that this is an anti-greylisting technique. Rather than implement actual retry logic in their spamware, the spammers just program it to send the same message repeatedly, a few minutes apart. If there is greylisting, the last copy might work; if there is no greylisting, who cares about the recipient getting a few more copies? It's not like it costs the spammer anything.
(Interestingly, that machine's reject log shows that refused connections happen in close succession. I don't have any current trapped spam to check the timestamps on spam that got through, so it may be that this is a technique that will only work on greylisting that has a very short waiting time.)
I believe this machine is only getting spammed by one spammer group
or one spammer software, because almost all of the SMTP sessions that
deliver spam use the HELO name of 'localhost'. This HELO name is
vanishingly rare in the SMTP logs of my other machines.
There is probably an interesting yet depressing research paper to be written on the spammer ecology, covering things like what spamware gets used by who and with what address lists. For example, the recent spam storm seems to have used an email address list that was hugely heavy on very old addresses, and since my Debian machine was untouched by it may not have been using any relatively recent ones.
2005-07-27
Spam Storm, July 26th 2005
There's a spam storm blowing strong this week and it's irritating me, because it's pretty much all coming from compromised zombie machines. Again. Zombies are clearly the number one general spam problem, and it's probably only going to get worse as more and more of the world gets more and more broadband.
The biggest indicator I have of the storm is simple: since 6am on Sunday, we've had 233,000 SMTP connections. As mentioned recently, we normally see on the order of 120,000 connections in a week; in less than three days, we're already at twice the weekly volume.
We use a simple IP-based greylisting technique, which the zombies
appear to be powering through by retrying over and over. Unlike the
first, very simple spam zombies, these seem to parse SMTP replies
enough to abort if they don't get any RCPT TO: commands accepted,
which is a nice change. (There was a time when reading our SMTP server
command logs would let one see entire spam messages, along with a
blizzard of 'syntax error' replies.)
The other big indication of spam storms is that our logs light up with a lot of essentially the same rejection. In this case, there seem to be one spam gang in two different forms.
#1: 'Foundation Men On Line'
The first spam gang sends multipart/alternative messages with a garbage plaintext part and a HTML part that's mostly a giant table designed to break up certain giveaway words. They use a consistent hostname pattern for their web sites, of '<Word>.<domain>.net' (they probably use .com domains too); the <Word> always has an initial capital. Because of the table structure, they have six zillion links in the message, all using the same domain with different words. They also seem to use Subject: lines that start with 'Re[<N>]:' or 'Re<N>:', which is pretty distinctive.
Domains I've seen them use include aliener.net, trapeziums.net, oiling.net, subsidises.net, and homespuns.net. They seem to be pushing male potency drugs and hosting out of China Network Communications Group Hainan (specifically IP address 221.11.133.66). They seem to have started their spamming as far back as July 19th, but only spun up to full speed on us recently.
Their domains seem to be registered to 'Foundation Men On Line', nameservice from balladries.com, nameservice out of 211.147.228.0/24 and 221.11.133.0/24 (both in China, of course). All their domains are registered through Yesnic in China (the choice of discriminating spammers).
Update: the latest domain, 'taxables.net', is now registered under the name 'Harry Gourley' and the email address whois77000@yahoo.com. Unlike the previous ones, which claimed an address in the Netherlands, this domain claims to be registered to someone in Georgia (not the US state, the one in Europe). It was registered July 26th 2005, so they're probably cycling through domains rapidly.
Update 2: as onf July 29th, the spammer has switched to the domain 'hundertsoft.com', at IP address 81.177.13.233 (SBL listed) in Russia. Nameservice has switched to 'moviedvddownload.info' (with the nameservers apparently located in the same network block) and the registry information has switched to 'James Halicho', claiming to be in Sunnyvale California.
#2: joboffer-colorphotomix.com
This is a form letter soliciting 'job applications' to the email
address manager@joboffer-colorphotomix.com. At first I thought this
was a separate spam, but this domain has identical registration
details to the spammed domains above, including the claimed owner, so
now it looks like another tentacle.
All of the spams I've received start with the following, in plain text and distinctly justified (never do this in plain ASCII, but that's another rant):
Our company deals with the software development, creation of human-engineered interface web-sites and modern design. We work with the clients from Canada, United Kingdom, Deutschland and the USA.
It goes on to offer part time work as a 'financial manager', which should be setting off your alarm bells. It's quite possible that the offer is 'genuine', but will enmesh any respondents in the primary spamming work of 'Foundation Men On Line'. (And whatever happens, they're likely to harvest the email addresses they get and spam them madly.)
Typical subject lines seem to be things like 'Amazing job offer <Name>'. The <Name> seems completely unrelated to who it gets sent to.
The domain is hosted by informtelecom.ru. I suspect that they are not about to stop hosting it any time soon, unfortunately. (I shall hope for a prompt SBL listing to encourage them otherwise, since these spammers seem to be rather virulent.)
2005-07-26
What ASNs are most actively spamming us
In this context, 'ASN' stands for 'Autonomous System Number'; broadly speaking, this tells us who is responsible for a particular IP address (or, technically speaking, who is ultimately responsible for getting IP packets to it).
There's a number of who other ways to tell who owns an IP address (querying whois.arin.net and then other registrars, for example), but there are two attractions of ASNs for this purpose:
- there are comprehensive IP to ASN databases that are easily queried by relatively simple programs. All of the other IP ownership lookup things are much harder to use.
- since an IP address's ASN determines how packets get to it, it's necessary to get it right. By contrast, nothing usually breaks if a registry's IP ownership information is out of date or outright wrong.
Chris's Nth law of information sources is 'if it doesn't have to be accurate for things to keep working, sooner or later it won't be'. (There is a well-known application of this to comments in source code.)
Instead of trying to run the numbers by frequency of attempted connection, I've looked here at how many different IP addresses from each ASN have been rejected at connection time by us over the past 28 and some change days. This is a good indication of how widespread of a problem a particular ASN is to us.
| # of different IPs | ASN | (owner) |
| 2831 | AS4766 | Korea Telecom |
| 1580 | AS9318 | Hanaro Telecom (Korea) |
| 1323 | AS4837 | CNCGROUP China169 Backbone |
| 951 | AS6478 | AT&T WorldNet Services |
| 777 | AS4134 | CHINANET-BACKBONE |
| 775 | AS19262 | Verizon Internet Services |
| 706 | AS33287 | Comcast Cable Communications, Inc. |
| 650 | AS22909 | Comcast Cable Communications, Inc. |
| 595 | AS6830 | UPC Distribution Services (Europe) |
| 512 | AS7738 | Telecomunicacoes da Bahia S.A. (Brazil) |
| 512 | AS7018 | AT&T WorldNet Services |
| 499 | AS9277 | THRUNET (Korea) |
| 488 | AS17676 | Softbank BB Corp. (Japan) |
| 481 | AS3786 | DACOM Corporation (Korea) |
| 480 | AS20115 | Charter Communications |
| 479 | AS22047 | VTR BANDA ANCHA S.A. (Chile) |
| 474 | AS12322 | Proxad ISP (France) |
| 428 | AS5617 | TPNET Polish Telecom |
| 415 | AS10318 | CABLEVISION S.A. (Argentina) |
| 411 | AS9304 | Hutchison Global Communications (Hong Kong) |
Some organizations have multiple ASNs for various reasons, as you can see with Comcast and AT&T Worldnet.
Korea is our largest problem source, followed rapidly by China. UPC is the 'chello.*' people, eg chello.nl, chello.at, and so on, who are a Europe-wide plague of zombies.
Part of this is entirely predictable; because we expect little legitimate email from the Far East (and to a lesser extent Europe), I am far more willing to be aggressive when blocking those areas, and it is not surprising that they score high in the list. (Significant swatches of China don't even get as far as connect-time rejection, as they're blocked by kernel IP filters.)
I suppose the most solid conclusion I can take away from this is that our problems come from all over. Just in the top-20 list alone we've hit most of the world's general areas with decent network infrastructure.
2005-07-24
Spam summary for July 23rd 2005
It looks like the hope from last week that spammers had stopped forging University of Toronto subdomains as the origin of their spam was in fact just a hope. 'Nonexistent local user' rejections are back up like clockwork. Oh well; it would have been nice.
IP level rejections:
Host/Mask Packets Bytes 212.216.176.0/24 6663 339K 213.4.149.11 6659 303K 151.189.20.157 3918 188K 83.103.57.17 3190 162K 61.128.0.0/10 2648 132K 193.111.201.127 2490 127K 221.216.0.0/13 2252 109K 194.30.33.37 2185 111K 219.128.0.0/12 2165 106K 216.7.201.43 2014 96672 194.250.136.10 1877 90096 68.63.102.114 1853 88944 66.235.196.26 1750 105K 220.245.160.88 1686 80768 218.0.0.0/11 1529 75816 217.52.32.185 1502 72096 65.214.61.100 1455 69840 193.41.153.65 1422 68256 216.109.197.126 1320 65136 220.160.0.0/11 1294 63600
Finally, 24.156.64.52 has dropped entirely out of the list. A number of other apparent dynamic/DHCP/cable modem sources are on it, though; I'm not surprised. Zombie spam is the big problem of these days.
Connection-time rejections:
24003 total
8375 rejected due to bad/missing reverse DNS information
1236 class bl-cbl
698 class bl-ordb
509 class bl-dsbl
335 class bl-spews
330 class bl-sbl
162 class bl-sdul
158 class bl-njabl
10 class bl-opm
Surprisingly, rejections have plummeted overall, although they're broadly like last week's. We had about 186,000 SMTP connections from at least 35,000 different IP addresses, which is somewhat up on our usual connections volume (I usually expect about 120,000 over the course of a week).
We rejected 9,200 IP addresses at connect time, let 1,330 machines get as far as the SMTP banner, and actually accepted email from only 197 different IP addresses. This is about the depressing ratio mismatch I expected from previous weeks.
At this point I'm running out of interesting statistics to take more looks at, so I'll probably flip away from weekly spam stats posts in favour of just generating the data and archiving it for long-term local analysis. (I suppose I could do a breakdown of connection time rejections by source ASN. (But if I do that, I should probably explain 'ASN' first.))
The necessary evolution of mail servers
In a comment on my Legend of Debian post, Chris Wage wrote in part:
Most of the servers I run are: webservers, mailservers, CVS servers, etc. These are things for which well-established stable software has existed for years. I don't need bleeding-edge software to do them. I need stable representatives of that software that are supported by security updates but don't otherwise change.
I have to disagree with this in the case of mail servers.
Unless you actively enjoy getting spammed your mail server software needs to be upgraded on a regular basis, because spammers evolve their techniques all of the time. One of my major issues with Debian Woody is that by the end of its life, its default mailserver (Exim 3) is clearly not adequate to the job of stopping spam.
This means that if you do spam filtering at all, you're going to need regular mail server software upgrades in some form. (This also means that you're going to need to evaluate upgrades if your operating system vendor doesn't deliver regular ones.)
Virus authors also evolve their tricks all the time, so people doing virus filtering need to think about this issue too.
2005-07-17
How many places actually send us email?
A few weeks ago I discovered that only 220 different IP addresses sent us actual email over the course of a week. This naturally raises the question: was this just a slow week, or is this typical? The answer turns out to be 'maybe'.
On the system I usually run my stats on, I only have logs going back about 28 days; looking at the entire time period, there was email from 443 different IP addresses. Not surprisingly, the distribution of how much email comes from where is very uneven, with almost all of the email we get is from a few mailing list hosts and the campus-wide email system.
On another system I have logs going back almost a year. Over that time, we got email from only 1,427 different IP addresses (only 95,000 email messages, though). On this system, the big source of email turns out to be Yahoo's webmail, and again things have a very sharp dropoff.
While this has practical uses for our specific situation, the more I think about it the less I think it really generalizes very well. Most of the people here use the central campus-wide email system and at most have their email forwarded from there to our systems; only a relatively few are still using our systems as their primary email system.
The usual quick rejection stats for 2005-07-16
2005-07-09
The minimum antispam features of a modern SMTP server
I have a machine running Debian's 'Woody' release. Debian Woody is several years old and uses Exim 3 as its (normal) mailer. Exim 3 unfortunately doesn't have many anti-spam features.
Exim 3's lack of antispam features combined with spammers vigorously finding some email addresses on the machine has been giving me lots of opportunities to a) experience semi-normal Internet spam life (eugh, says I), b) grind my teeth a lot, and c) contemplate what anti-spam features I now consider necessary in any SMTP server I actually want to run.
My current list is:
- reject bad
MAIL FROMaddresses. - reject mail to nonexistent users during the SMTP transaction.
- reject selected
HELO/EHLOnames. - be able to be run under an inetd-style frontend, so I can use a separate frontend of my own choosing. Failing this, DNS blocklist support.
- support some sort of greylisting, even if that's just per-IP-address.
- run external filters on email messages, with full envelope information available as well; filters can at least reject messages, which will cause a SMTP-time rejection.
Everyone can do #1. Everyone should do #2, because otherwise your
SMTP server is going to spam innocent third parties when spammers
forge their address into MAIL FROM and send spam to nonexistent
users on your machines. (Yes, I'm looking at you, QMail; please wake
up and join the 21st century.)
A majority of the spam my Debian machine is being hit with uses
clearly invalid HELO names: localhost, 127.0.0.1, the machine's
hostname, the machine's IP address (without '[' and ']' around
it), and so on. Filtering this sort of crud out is now clearly
essential.
I want the ability to run my own frontend inetd-like server because no SMTP server is going to build in the kind of powerful connection-time filtering I want. (And I can't ask them to; it's a lot of specialized code.) This can be faked by a message filter that has the envelope information available, if I really have to.
So far I would rather implement DNS blocklists in the frontend than in the SMTP server; my feeling is that I can make them more flexible there. (I may recant this view at some point; there are certainly some tricks a really flexible SMTP server could play.)
Evidence from my Debian machine suggests that almost all spam comes from open proxies (as if I didn't know that already). Greylisting, even of a very basic sort, is the most powerful mechanism to trim this down. Basic per-IP-address greylisting can be implemented in a frontend, but I'd rather have more powerful things.
Support for #6 enables a number of advanced tricks, including SMTP time rejection of viruses.
Of these, it appears that Exim 3 will do #1, #2, and check DNS blocklists. Exim 4 seems like it can do everything except perhaps #4 (but it can check DNS blocklists itself). Surprisingly few SMTP servers seem to have really good support for #6, and of course I rather suspect #4 will make MTA authors laugh derisively (except for QMail; pity that's ruled out for other reasons).
2005-07-03
Checking for dead DNSBls
Another Saturday, another spam entry. Today I decided to look at our logs to see if some of the DNSBls we check had either gone away or weren't giving us any hits. Since our software checks DNSBls in sequence instead of in parallel, removing useless DNSBls both reduces query volume and speeds things up slightly.
(In general, it's wise to do this periodically unless you're current
on news.admin.net-abuse.email and other information sources, and
already know about any changed or decommissioned DNSbls.)
This time around, I didn't find any. The closest to not being used is
opm.blitzed.org, with dnsbl.njabl.org and relays.ordb.org as
runners-up, but I decided not to remove any of them.
I also considered shuffling the order of the checks, but decided
against it on policy reasons. I would rather reject a SMTP connection
for a clear neutral reason like 'open proxy' rather than something
more contentious, like 'listed in Spews', even if being listed in
Spews is several times more likely than being listed in
opm.blitzed.org. So opm.blitzed.org stays before Spews in our
checks.
This isn't a completely loss; although I wasn't able to remove anything and this blog entry is a bit boring as a result, at least I checked.