2007-02-25
How CSLab currently does email anti-spam stuff
The Computer Science department is strongly against rejecting email just because it might be spam (at least by default); enough people would rather sort through spam than risk rejecting legitimate email. People are willing to have known viruses removed from their email (although not executables in general).
(For clarity: the weekly spam summaries I do are not for CSLab's mail system.)
I once summarized CSLab's general rule is 'thou shalt not reject email just because it smells bad'. We can reject email that has narrow technical failings such as nonexistent origin address domains, and do things that don't cause any problems with legitimate mailers but get spammers to give up. We can't reject on stuff that isn't a clear technical failing, and we can't do anything that causes problems for legitimate mailers.
All external email goes through a frontend machine running Exim 4. This machine does the following spam-related things:
- it waits a few seconds before spitting out the initial greeting
banner and the response to
EHLO/HELO; this is an attempt to persuade spam clients that they are being tarpitted so that they give up. Connections from IP addresses listed in zen.spamhaus.org are delayed longer.(This is not as good as the real OpenBSD
spamd, which trickles out replies one character at a time; Exim just sits on the whole line for N seconds and then blasts it out. I got the general idea from Bob Beck's spamd presentation.) - the
MAIL FROMdomain has to exist (if it's one of our domains, the full address has to be valid). - the
RCPT TOaddress has to be to us and valid. The frontend machine has a list of valid local usernames (including aliases and mailing lists and so on), so it can immediately reject email to nonexistent local users. - at
RCPT TOtime, addresses that have opted into it immediately reject email from senders in zen.spamhaus.org, and greylist most everyone else (usinggreylistd, which is a general daemon for doing this). At the moment we have no convenient way for users to opt into this, so it is mostly protecting system aliases. - if the sender is in zen.spamhaus.org, we add a message header about it.
- the message is run through Sophos PureMessage, which removes known
viruses and, if the message has a high enough spam score, adds a note
about it to the start of the
Subject:header.
After all this the email message is delivered to our central email
machine for actual processing and delivery and so on. We don't do
anything special with messages tagged as spam; each person gets to
decide for themselves how they want to handle such emails, whether
that is to filter them on the server with procmail or leave it
up to their IMAP client's filtering or do nothing at all.
For an organization that doesn't want to reject email outright, I think that this sort of tagging is a big win; it makes things visible and it makes it easy for all sorts of clients to filter things. You need a reliable spam filter that doesn't need training, though.
We use Sophos PureMessage because the university has a site-wide license for it, so it doesn't cost us anything, and the central campus email system uses it and likes it. In my experience it does a good but not perfect job at recognizing spam, and I've only gotten a few reports of false positives. (And Sophos maintains the spam and virus filtering rules instead of us.)
Things we don't do (that sometimes surprise people):
- reject
HELOs that claim to be from us. This is merely a bad smell, not a narrow technical defect. - general greylisting, because there are legitimate mailers that are known to have problems with it.
Exim does reject some badly formed HELOs by default, and we have left
that on; I consider that to be a narrow technical defect issue. We also
reject email to IP address domain literals, which I believe is another
Exim default.
We are not currently doing nolisting, but we may in the
future; there are defensible technical reasons for having a lower
preference MX pointing to our internal central email machine, and
its SMTP port isn't reachable from the outside world any more.
Weekly spam summary on February 24th, 2007
This week, we:
- got 15,188 messages from 253 different IP addresses.
- handled 21,573 sessions from 1,281 different IP addresses.
- received 238,853 connections from at least 71,848 different IP addresses.
- hit a highwater of 10 connections being checked at once.
Connection and session volume is down a bit from last week. Day to day volume fluctuated up and down through the week:
| Day | Connections | different IPs |
| Sunday | 29,706 | +11,012 |
| Monday | 40,386 | +12,084 |
| Tuesday | 41,718 | +12,719 |
| Wednesday | 34,748 | +10,352 |
| Thursday | 36,413 | +9,568 |
| Friday | 32,318 | +9,189 |
| Saturday | 23,564 | +6,924 |
Kernel level packet filtering top ten:
Host/Mask Packets Bytes 205.152.59.0/24 27609 1252K 207.145.125.204 25029 1272K 206.223.168.238 15375 843K 213.29.7.0/24 8533 512K 211.136.0.0/14 7240 386K 67.95.56.42 6865 319K 203.89.173.58 6836 301K 204.202.15.102 6800 336K 81.201.105.157 5045 242K 204.202.23.184 4987 246K
This is up substantially from last week. The big news this week is that I blocked 205.152.59.0/24 very early on in the week; this is Bellsouth's outgoing mail servers. We no longer accept email from Bellsouth because they have gotten into the free webmail business, and as a result are now active participants in the advance fee fraud spam business. (Many US ISPs have apparently gone this direction, for reasons I don't understand.)
- 207.145.125.204, 67.95.56.42, 204.202.15.102, and 204.202.23.184 all kept trying to send email with an origin address that had already tripped our spamtraps, mostly for what looks like phish spam (certain sorts of origin addresses are dead giveaways).
- 206.223.168.238 is in the CBL.
- 203.89.173.58 kept trying with a bad
HELO. - 81.201.105.157 is in the NJABL.
All that makes this a highly atypical week; for example, we don't have a single top-10 IP address that we've seen before. In the good news front, 208.99.198.64/27 continued not sending us so much as a single connection attempt over the week, and have thus dropped off my radar for future reports.
Connection time rejection stats:
69674 total
43536 dynamic IP
17981 bad or no reverse DNS
6394 class bl-cbl
295 class bl-njabl
250 class bl-sdul
220 class bl-pbl
159 acceleratebiz.com
147 class bl-sbl
144 class bl-dsbl
33 inetekk.com
15 cuttingedgemedia.com
Overall volume is about the same as last week. The SBL breakdown is slightly interesting:
| 59 | SBL51080 | phish spam source |
| 17 | SBL49074 | hijacked server that's spamming (13 Dec 2006) |
| 11 | SBL49046 | advance fee fraud spam source (13 Dec 2006) |
| 10 | SBL50375 | a /25 ROKSO listing for Eric Reinertsen (29 Jan 2007) |
| 10 | SBL49248 | saigonnet.vn webmail, listed as an advance fee fraud spam source (18 Dec 2006) |
Of these, SBL49046 and SBL50375 appeared in my summary last week, at about the same volume.
Three of the top 30 most rejected IP addresses were rejected 100
times or more this week: 193.4.194.142 (216 times, bad reverse DNS),
64.166.14.222 (168 times, dynamic IP), and 81.201.105.157 (153
times, on the NJABL). Eight of the top 30 are currently in the
CBL, eight are currently in bl.spamcop.net, 10 are in the PBL, a grand total of 17 are in the combined
zen.spamhaus.org zone, and one is in
the SBL: 69.15.58.106, SBL51080.
This week Hotmail managed:
- 4 messages accepted, two of them probably legitimate.
- no messages rejected because they came from non-Hotmail email addresses.
- 57 messages sent to our spamtraps.
- 10 messages refused because their sender addresses had already hit our spamtraps.
- 5 messages refused due to their origin IP address (3 from the Cote d'Ivoire, one from Nigeria, and one in the CBL).
And the final numbers:
| what | # this week | (distinct IPs) | # last week | (distinct IPs) |
Bad HELOs |
877 | 101 | 979 | 155 |
| Bad bounces | 16 | 12 | 9 | 8 |
The winner of the bad HELO contest this week was 72.165.125.122,
with 125 rejections until it got blocked; the next highest source
only managed 61. It's sad to see the bad bounce numbers start rising
again, but they're still low, and this week they seem to have come
from all over, including a darpa.mil machine and something in the
Arab Emirates that has been forging its HELO name and so won't be
talking to us any more.
Bad bounces were sent to 13 different usernames this week, mostly to
real ex-users and plausible usernames. There was one alphabetical
jumble, and E07 and 3E4B also put in appearances. The most popular
bad bounce targets (admittedly at 3 and 2 hits respectively) were both
ex-users.