2007-02-25
How CSLab currently does email anti-spam stuff
The Computer Science department is strongly against rejecting email just because it might be spam (at least by default); enough people would rather sort through spam than risk rejecting legitimate email. People are willing to have known viruses removed from their email (although not executables in general).
(For clarity: the weekly spam summaries I do are not for CSLab's mail system.)
I once summarized CSLab's general rule is 'thou shalt not reject email just because it smells bad'. We can reject email that has narrow technical failings such as nonexistent origin address domains, and do things that don't cause any problems with legitimate mailers but get spammers to give up. We can't reject on stuff that isn't a clear technical failing, and we can't do anything that causes problems for legitimate mailers.
All external email goes through a frontend machine running Exim 4. This machine does the following spam-related things:
- it waits a few seconds before spitting out the initial greeting
banner and the response to
EHLO/HELO; this is an attempt to persuade spam clients that they are being tarpitted so that they give up. Connections from IP addresses listed in zen.spamhaus.org are delayed longer.(This is not as good as the real OpenBSD
spamd, which trickles out replies one character at a time; Exim just sits on the whole line for N seconds and then blasts it out. I got the general idea from Bob Beck's spamd presentation.) - the
MAIL FROMdomain has to exist (if it's one of our domains, the full address has to be valid). - the
RCPT TOaddress has to be to us and valid. The frontend machine has a list of valid local usernames (including aliases and mailing lists and so on), so it can immediately reject email to nonexistent local users. - at
RCPT TOtime, addresses that have opted into it immediately reject email from senders in zen.spamhaus.org, and greylist most everyone else (usinggreylistd, which is a general daemon for doing this). At the moment we have no convenient way for users to opt into this, so it is mostly protecting system aliases. - if the sender is in zen.spamhaus.org, we add a message header about it.
- the message is run through Sophos PureMessage, which removes known
viruses and, if the message has a high enough spam score, adds a note
about it to the start of the
Subject:header.
After all this the email message is delivered to our central email
machine for actual processing and delivery and so on. We don't do
anything special with messages tagged as spam; each person gets to
decide for themselves how they want to handle such emails, whether
that is to filter them on the server with procmail or leave it
up to their IMAP client's filtering or do nothing at all.
For an organization that doesn't want to reject email outright, I think that this sort of tagging is a big win; it makes things visible and it makes it easy for all sorts of clients to filter things. You need a reliable spam filter that doesn't need training, though.
We use Sophos PureMessage because the university has a site-wide license for it, so it doesn't cost us anything, and the central campus email system uses it and likes it. In my experience it does a good but not perfect job at recognizing spam, and I've only gotten a few reports of false positives. (And Sophos maintains the spam and virus filtering rules instead of us.)
Things we don't do (that sometimes surprise people):
- reject
HELOs that claim to be from us. This is merely a bad smell, not a narrow technical defect. - general greylisting, because there are legitimate mailers that are known to have problems with it.
Exim does reject some badly formed HELOs by default, and we have left
that on; I consider that to be a narrow technical defect issue. We also
reject email to IP address domain literals, which I believe is another
Exim default.
We are not currently doing nolisting, but we may in the
future; there are defensible technical reasons for having a lower
preference MX pointing to our internal central email machine, and
its SMTP port isn't reachable from the outside world any more.
Weekly spam summary on February 24th, 2007
This week, we:
- got 15,188 messages from 253 different IP addresses.
- handled 21,573 sessions from 1,281 different IP addresses.
- received 238,853 connections from at least 71,848 different IP addresses.
- hit a highwater of 10 connections being checked at once.
Connection and session volume is down a bit from last week. Day to day volume fluctuated up and down through the week:
| Day | Connections | different IPs |
| Sunday | 29,706 | +11,012 |
| Monday | 40,386 | +12,084 |
| Tuesday | 41,718 | +12,719 |
| Wednesday | 34,748 | +10,352 |
| Thursday | 36,413 | +9,568 |
| Friday | 32,318 | +9,189 |
| Saturday | 23,564 | +6,924 |
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 205.152.59.0/24 27609 1252K 207.145.125.204 25029 1272K 206.223.168.238 15375 843K 213.29.7.0/24 8533 512K 211.136.0.0/14 7240 386K 67.95.56.42 6865 319K 203.89.173.58 6836 301K 204.202.15.102 6800 336K 81.201.105.157 5045 242K 204.202.23.184 4987 246K
This is up substantially from last week. The big news this week is that I blocked 205.152.59.0/24 very early on in the week; this is Bellsouth's outgoing mail servers. We no longer accept email from Bellsouth because they have gotten into the free webmail business, and as a result are now active participants in the advance fee fraud spam business. (Many US ISPs have apparently gone this direction, for reasons I don't understand.)
- 207.145.125.204, 67.95.56.42, 204.202.15.102, and 204.202.23.184 all kept trying to send email with an origin address that had already tripped our spamtraps, mostly for what looks like phish spam (certain sorts of origin addresses are dead giveaways).
- 206.223.168.238 is in the CBL.
- 203.89.173.58 kept trying with a bad
HELO. - 81.201.105.157 is in the NJABL.
All that makes this a highly atypical week; for example, we don't have a single top-10 IP address that we've seen before. In the good news front, 208.99.198.64/27 continued not sending us so much as a single connection attempt over the week, and have thus dropped off my radar for future reports.
Connection time rejection stats:
69674 total
43536 dynamic IP
17981 bad or no reverse DNS
6394 class bl-cbl
295 class bl-njabl
250 class bl-sdul
220 class bl-pbl
159 acceleratebiz.com
147 class bl-sbl
144 class bl-dsbl
33 inetekk.com
15 cuttingedgemedia.com
Overall volume is about the same as last week. The SBL breakdown is slightly interesting:
| 59 | SBL51080 | phish spam source |
| 17 | SBL49074 | hijacked server that's spamming (13 Dec 2006) |
| 11 | SBL49046 | advance fee fraud spam source (13 Dec 2006) |
| 10 | SBL50375 | a /25 ROKSO listing for Eric Reinertsen (29 Jan 2007) |
| 10 | SBL49248 | saigonnet.vn webmail, listed as an advance fee fraud spam source (18 Dec 2006) |
Of these, SBL49046 and SBL50375 appeared in my summary last week, at about the same volume.
Three of the top 30 most rejected IP addresses were rejected 100
times or more this week: 193.4.194.142 (216 times, bad reverse DNS),
64.166.14.222 (168 times, dynamic IP), and 81.201.105.157 (153
times, on the NJABL). Eight of the top 30 are currently in the
CBL, eight are currently in bl.spamcop.net, 10 are in the PBL, a grand total of 17 are in the combined
zen.spamhaus.org zone, and one is in
the SBL: 69.15.58.106, SBL51080.
This week Hotmail managed:
- 4 messages accepted, two of them probably legitimate.
- no messages rejected because they came from non-Hotmail email addresses.
- 57 messages sent to our spamtraps.
- 10 messages refused because their sender addresses had already hit our spamtraps.
- 5 messages refused due to their origin IP address (3 from the Cote d'Ivoire, one from Nigeria, and one in the CBL).
And the final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
877 | 101 | 979 | 155 |
| Bad bounces | 16 | 12 | 9 | 8 |
The winner of the bad HELO contest this week was 72.165.125.122,
with 125 rejections until it got blocked; the next highest source
only managed 61. It's sad to see the bad bounce numbers start rising
again, but they're still low, and this week they seem to have come
from all over, including a darpa.mil machine and something in the
Arab Emirates that has been forging its HELO name and so won't be
talking to us any more.
Bad bounces were sent to 13 different usernames this week, mostly to
real ex-users and plausible usernames. There was one alphabetical
jumble, and E07 and 3E4B also put in appearances. The most popular
bad bounce targets (admittedly at 3 and 2 hits respectively) were both
ex-users.
2007-02-24
Thesis: any server push technology inevitably breeds spam
Consider various server push technologies, where things come to you instead of you having to seek them out: email, instant messaging, voice over IP phone server, and even text messaging on cell phones. All of them have spam problems (generally growing).
This is not a coincidence. Any server push technology will get overrun by spammers, because server push inherently gives them access to people and is thus very, very attractive. As a consumer of server push technology, your only recourse from the onslaught is to hide, to block, to filter; you can't actually get away.
(The push technology provider can't keep all the spammers out, if only because sooner or later some of them are in its own marketing department.)
Client pull technology is much more resilient. The spammers have to be attractive to get you to visit even once, then genuinely interesting to keep you around, and you can easily get away. Thus it is a feature, not a problem, that things like syndication feeds do not have a server push option.
(And indeed much of the spammer activity in client pull technology like the web is about being attractive, for example getting a high Google search rank for some valuable keywords.)
2007-02-18
Weekly spam summary on February 17th, 2007
This week, we:
- got 15,925 messages from 244 different IP addresses.
- handled 23,465 sessions from 1,341 different IP addresses.
- received 244,268 connections from at least 75,016 different IP addresses.
- hit a highwater of 16 connections being checked at once.
This is about the same as last week. The per day figures show some significant fluctuations:
| Day | Connections | different IPs |
| Sunday | 36,660 | +13,133 |
| Monday | 37,139 | +12,216 |
| Tuesday | 43,156 | +12,833 |
| Wednesday | 36,296 | +11,682 |
| Thursday | 31,349 | +8,987 |
| Friday | 32,322 | +8,878 |
| Saturday | 27,346 | +7,287 |
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 213.29.7.0/24 14878 892K 64.166.14.222 14215 682K 65.99.209.156 12430 682K 213.4.149.12 9316 484K 68.153.217.220 6508 312K 71.89.4.212 4907 235K 70.246.90.150 4413 212K 66.15.119.165 4186 196K 216.229.180.243 3695 177K 66.42.167.154 3136 150K
This is definitely down from last week, which is welcome, and for the first time in a while 213.4.149.12 (terra.es) is not at the top of the list.
- 64.166.14.222, 213.4.149.12, 68.153.217.220, and 66.15.119.165 all return from last week.
- 65.99.209.156 kept trying to send us spam that had already tripped our spamtraps.
- 71.89.4.212 is a charter.com DHCP machine of some sort.
- 70.246.90.150 kept trying with a bad
HELO. - 216.229.180.243 kept trying to send what looks like phish spam with
MAIL FROMs that had already hit our spamtraps. - 66.42.167.154 is in the SORBS DUL.
To my surprise, 208.99.198.64/27 totally disappeared; in contrast to their performance last week, this week we saw not so much as one packet from them. I would like to think that this is because they got disconnected, but I'm not that optimistic.
Connection time rejection stats:
71169 total
44825 dynamic IP
17384 bad or no reverse DNS
6398 class bl-cbl
1004 class bl-sbl
203 class bl-pbl
201 class bl-njabl
183 class bl-sdul
177 class bl-dsbl
81 cuttingedgemedia.com
Almost all of the SBL hits came from 69.42.169.0/24 (914 hits), listed as SBL50892 (spam source and landing pages, listed February 6th) and SBL50451 (colocentral.com spammer hosting, an escalation listing, also listed February 6th). They've showed up before, back in late January, where they were even more active.
(The next highest SBL listing only has 17 rejections; it is SBL49046, a free webmail place listed for (what else) advance fee fraud spamming. After that is SBL50375 (13 rejections, a Rokso-listed place), and SBL50928 (12 rejections, a hijacked server).)
Two out of the top 30 most rejected IP addresses were rejected 100
times or more this week; 64.166.14.222 (631 times) and 60.248.160.38
(109 times). Only 7 out of the top 30 most rejected IP addresses are
currently in the CBL, none are currently in bl.spamcop.net, and 12
are in the Spamhaus PBL. One
is currently in the SBL: 201.158.98.10 (50 rejections) is in SBL48034, a /21 listing of
'Suavemente LLC', listed February 5th.
This week's Hotmail score is:
- 1 message accepted, almost certainly a legitimate one.
- 3 messages rejected because they came from non-Hotmail email
addresses, all from '
service_banc@msn.com'. - 34 messages sent to our spamtraps.
- 1 message refused because its sender address had already hit our spamtraps.
- 1 message refused due to its origin IP address being from SAIX aka telkom.co.za.
And the final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
979 | 155 | 995 | 154 |
| Bad bounces | 9 | 8 | 12 | 8 |
I am amazed; apparently last week's low bad bounces was not just a one-time anomaly. Bad bounces were sent to only 7 different usernames this week, and interestingly all seven of them are accounts that used to exist here. Three bounces went to a relatively current domain name, two bounces went to a somewhat out of date domain name, and four went to an outdated hostname that is a strong spam and spam bounce signature these days.
2007-02-10
Weekly spam summary on February 10th, 2007
This week, we:
- got 15,405 messages from 262 different IP addresses.
- handled 23,822 sessions from 1,467 different IP addresses.
- received 258,033 connections from at least 76,977 different IP addresses.
- hit a highwater of 7 connections being checked at once.
The overall volume is about the same as last week; technically it's up a bit, but I figure it's within the normal fluctuation levels by now.
| Day | Connections | different IPs |
| Sunday | 37,528 | +13,308 |
| Monday | 44,276 | +12,563 |
| Tuesday | 40,718 | +10,913 |
| Wednesday | 30,813 | +9,073 |
| Thursday | 38,067 | +11,262 |
| Friday | 36,639 | +10,185 |
| Saturday | 29,992 | +9,673 |
It's interesting that the connection count doesn't seem to completely tied to the number of new IP addresses; the highs and lows don't match up, although there's a general correlation.
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 208.99.198.64/27 44955 2696K 213.29.7.0/24 29284 1756K 213.4.149.12 18732 974K 64.166.14.222 12807 615K 193.70.192.0/24 8622 389K 66.15.119.165 6667 312K 68.149.160.108 6370 298K 206.100.222.95 5001 240K 68.153.217.220 4846 232K 69.15.68.98 4681 219K
Yow. Things are significantly up over last week, and we have a serious winner.
- 208.99.198.64/27 is totallyfreeld.net. They used to be SBL-listed,
but for some reason they got taken out, and apparently they wasted
no time in opening up the floodgates.
- 213.4.149.12 (terra.es), 64.166.14.222 (PacBell DSL), 66.15.119.165
(on the SORBS DUL), and 206.100.222.95 (bad
HELOs) all return from last week. - 68.149.160.108 tried too many bad
HELOs. - 68.153.217.220 is a Bellsouth ADSL IP that we consider dynamic.
- 69.15.68.98 also had too many bad
HELOs and returns from early January.
It's been quite a while since we had so many returning IPs, but the real standout is clearly 208.99.198.64/27 by a mile, beating even centrum.cz's 213.29.7.0/24 (itself well up over last week). Given that they somehow got out of the SBL, I am now very glad that I put in our own kernel-level blocks (and I have now made sure that they are listed in pretty much every level of block that we have, just in case).
Connection time rejection stats:
73757 total
45224 dynamic IP
21356 bad or no reverse DNS
5533 class bl-cbl
221 class bl-sdul
211 class bl-dsbl
207 class bl-pbl
101 class bl-njabl
95 class bl-sbl
Things are distinctly up compared to last week, despite the not markedly higher overall connection count. As usual, everything except the CBL is relatively useless, although I suspect that the PBL and the SORBS DUL would jump significantly if we didn't already have our own blocks for those.
The two leading SBL listings were SBL50738, an advance fee fraud spam listing from this month (12 rejections) and SBL50181, a compromised Brazilian web server abused by advance fee fraud spammers since November (10 rejections, and we've seen it before).
Three of the top 30 most rejected IP addresses were rejected 100 times
or more this week: 210.47.42.5 (259 times, bad DNS), 82.38.128.6 (143
times, dynamic IP), and 64.166.14.222 (127 times, 'dynamic' IP). 16 of
the top 30 are currently in the CBL and 18 are currently in
bl.spamcop.net.
This week Hotmail managed:
- no messages accepted.
- no messages rejected because they came from non-Hotmail email addresses.
- 48 messages sent to our spamtraps.
- 2 messages refused because their sender addresses had already hit our spamtraps.
- 6 messages refused due to their origin IP address (3 from the Cote d'Ivoire, two from Gilat Satcom, and one in SBL50431).
And the final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
995 | 154 | 982 | 113 |
| Bad bounces | 12 | 8 | 105 | 88 |
Apparently some sort of miracle happened this week and the spammers all stopped forging us. Alternately, my software is broken.
Bad bounces were sent to only 11 different bad usernames this week;
'E7D6' got two hits and everyone else got one. Bounces went to three
hex bad usernames (E7D6, E07, and 3E4B), four actual ex-users,
two things that could be valid usernames, and two random alphabetical
jumbles. Bounces came from machines in Germany and Russia, among other
places.
Colour me pleasantly happy and certainly hoping that this keeps up. But I'm not going to hold my breath.
A temptation with challenge/response anti-spam systems
Every time I see a mail from a C/R system, I get more and more tempted to teach our mail filtering infrastructure about the most common ones, so that it can automatically acknowledge the challenges, discard the messages, and not bother the users with them at all.
Will this acknowledge a lot of spam, and thus dump it on the people operating those C/R systems? Sure, but that's not our problem. And I'd clearly be doing our users a service, especially if C/R systems get widespread.
(This is another example of how C/R systems try to work by offloading your spam problem on precisely the wrong people. The only way they can 'work' at all is if most of the mail addresses you challenge don't even exist; otherwise you are reaching either spammers or pissed off people, neither of which have your interests in mind.)
As a special bonus prize, I could even hack our system to do this even
for local addresses that don't actually exist, since it's perfectly
possible to automatically acknowledge the challenge and 5xx the DATA
command at the end of the SMTP conversation. I'd have to make sure that
this only happened for single-recipient email, but that describes all
of the C/R email I'd want to do this to.
(Ob-attribution-darnit: I've had this thought for a while, but the impetus to actually write this entry was provided by reading about a related temptation with C/R systems here.)
2007-02-03
Weekly spam summary on February 3rd, 2007
This week, we:
- got 15,790 messages from 280 different IP addresses.
- handled 23,657 sessions from 1,340 different IP addresses.
- received 248,408 connections from at least 73,118 different IP addresses.
- hit a highwater of 17 connections being checked at once.
Volume is up again from last week, although the number of different IPs is down slightly.
| Day | Connections | different IPs |
| Sunday | 28,871 | +11,587 |
| Monday | 30,772 | +10,424 |
| Tuesday | 39,487 | +10,941 |
| Wednesday | 38,430 | +10,523 |
| Thursday | 36,188 | +9,602 |
| Friday | 37,864 | +10,746 |
| Saturday | 36,796 | +9,295 |
This is somewhat more even than last week, but that's about all I can say for it.
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 193.70.192.0/24 18193 820K 213.4.149.12 17817 926K 213.29.7.0/24 17387 1043K 193.95.28.40 14077 653K 64.166.14.222 10431 501K 203.143.22.50 7058 423K 24.39.78.164 6715 322K 206.100.222.95 6082 292K 66.15.116.230 5391 259K 66.15.119.165 4741 222K
Things are definitely up compared to last week.
- 213.4.149.12 and 66.15.119.165 return from last week.
- 193.95.28.40 kept attempting to send us stuff that had already tripped spamtraps.
- 64.166.14.222 returns from early January, still blocked for being a PacBell DSL line.
- 203.143.22.50 is a Sri Lankan IP address with no reverse DNS.
- 24.39.78.164 and 206.100.222.95 both tried too often with bad
HELOs. - 66.15.116.230 is on the NJABL.
Connection time rejection stats:
64250 total
39581 dynamic IP
17883 bad or no reverse DNS
5133 class bl-cbl
333 class bl-dsbl
166 class bl-njabl
139 class bl-pbl
123 class bl-sbl
116 class bl-sdul
21 verticalresponse.com
13 cuttingedgemedia.com
Four of the the top 30 most rejected IPs were rejected 100 times or
more this week: 81.51.108.120 (349 times), 64.166.14.222 (199 times),
68.91.134.69 (118 times), and 211.180.132.9 (100 times). The first three
were rejected as dynamic IPs, the fourth for having bad reverse DNS. Ten
of the top 30 are currently in the CBL and a whopping 21 are currently
listed in bl.spamcop.net.
This week's Hotmail scores are:
- 5 messages accepted.
- 1 message rejected because it came from a non-Hotmail email address.
- 36 messages sent to our spamtraps.
- 2 messages refused because their sender addresses had already hit our spamtraps.
- 8 messages refused due to their origin IP address (3 in the SBL, 2 from the Cote d'Ivoire, 1 in the CBL, 1 from Nigeria, and one from SAIX).
Somehow, I don't think we're losing anything by not accepting an email message this week from one 'netaleloto_awrd_006@hotmail.it'.
The SBL listings are SBL50384, from January 2007, SBL46422, from September 2006, and SBL32972, from November 2005, when it was spamming through Hotmail. I have no words.
And the final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
982 | 113 | 1171 | 134 |
| Bad bounces | 105 | 88 | 229 | 130 |
Germany and Russia seem to be the leading sources of bad bounces this week, with the usual contributions from various other places. Unlike last week, there's no particularly big single source; like last week, the most common bad usernames continue to be alphabetical jumbles, with a certain amount of more plausible ones mixed in. Bad bounces were sent to 96 different bad usernames this week.