Wandering Thoughts archives

2007-02-25

How CSLab currently does email anti-spam stuff

The Computer Science department is strongly against rejecting email just because it might be spam (at least by default); enough people would rather sort through spam than risk rejecting legitimate email. People are willing to have known viruses removed from their email (although not executables in general).

(For clarity: the weekly spam summaries I do are not for CSLab's mail system.)

I once summarized CSLab's general rule is 'thou shalt not reject email just because it smells bad'. We can reject email that has narrow technical failings such as nonexistent origin address domains, and do things that don't cause any problems with legitimate mailers but get spammers to give up. We can't reject on stuff that isn't a clear technical failing, and we can't do anything that causes problems for legitimate mailers.

All external email goes through a frontend machine running Exim 4. This machine does the following spam-related things:

  1. it waits a few seconds before spitting out the initial greeting banner and the response to EHLO/HELO; this is an attempt to persuade spam clients that they are being tarpitted so that they give up. Connections from IP addresses listed in zen.spamhaus.org are delayed longer.

    (This is not as good as the real OpenBSD spamd, which trickles out replies one character at a time; Exim just sits on the whole line for N seconds and then blasts it out. I got the general idea from Bob Beck's spamd presentation.)

  2. the MAIL FROM domain has to exist (if it's one of our domains, the full address has to be valid).
  3. the RCPT TO address has to be to us and valid. The frontend machine has a list of valid local usernames (including aliases and mailing lists and so on), so it can immediately reject email to nonexistent local users.
  4. at RCPT TO time, addresses that have opted into it immediately reject email from senders in zen.spamhaus.org, and greylist most everyone else (using greylistd, which is a general daemon for doing this). At the moment we have no convenient way for users to opt into this, so it is mostly protecting system aliases.

  5. if the sender is in zen.spamhaus.org, we add a message header about it.
  6. the message is run through Sophos PureMessage, which removes known viruses and, if the message has a high enough spam score, adds a note about it to the start of the Subject: header.

After all this the email message is delivered to our central email machine for actual processing and delivery and so on. We don't do anything special with messages tagged as spam; each person gets to decide for themselves how they want to handle such emails, whether that is to filter them on the server with procmail or leave it up to their IMAP client's filtering or do nothing at all.

For an organization that doesn't want to reject email outright, I think that this sort of tagging is a big win; it makes things visible and it makes it easy for all sorts of clients to filter things. You need a reliable spam filter that doesn't need training, though.

We use Sophos PureMessage because the university has a site-wide license for it, so it doesn't cost us anything, and the central campus email system uses it and likes it. In my experience it does a good but not perfect job at recognizing spam, and I've only gotten a few reports of false positives. (And Sophos maintains the spam and virus filtering rules instead of us.)

Things we don't do (that sometimes surprise people):

  • reject HELOs that claim to be from us. This is merely a bad smell, not a narrow technical defect.
  • general greylisting, because there are legitimate mailers that are known to have problems with it.

Exim does reject some badly formed HELOs by default, and we have left that on; I consider that to be a narrow technical defect issue. We also reject email to IP address domain literals, which I believe is another Exim default. We are not currently doing nolisting, but we may in the future; there are defensible technical reasons for having a lower preference MX pointing to our internal central email machine, and its SMTP port isn't reachable from the outside world any more.

CSLabSpamFiltering written at 16:28:51; Add Comment

Weekly spam summary on February 24th, 2007

This week, we:

  • got 15,188 messages from 253 different IP addresses.
  • handled 21,573 sessions from 1,281 different IP addresses.
  • received 238,853 connections from at least 71,848 different IP addresses.
  • hit a highwater of 10 connections being checked at once.

Connection and session volume is down a bit from last week. Day to day volume fluctuated up and down through the week:

Day Connections different IPs
Sunday 29,706 +11,012
Monday 40,386 +12,084
Tuesday 41,718 +12,719
Wednesday 34,748 +10,352
Thursday 36,413 +9,568
Friday 32,318 +9,189
Saturday 23,564 +6,924

Kernel level packet filtering top ten:

Host/Mask           Packets   Bytes
205.152.59.0/24       27609   1252K
207.145.125.204       25029   1272K
206.223.168.238       15375    843K
213.29.7.0/24          8533    512K
211.136.0.0/14         7240    386K
67.95.56.42            6865    319K
203.89.173.58          6836    301K
204.202.15.102         6800    336K
81.201.105.157         5045    242K
204.202.23.184         4987    246K

This is up substantially from last week. The big news this week is that I blocked 205.152.59.0/24 very early on in the week; this is Bellsouth's outgoing mail servers. We no longer accept email from Bellsouth because they have gotten into the free webmail business, and as a result are now active participants in the advance fee fraud spam business. (Many US ISPs have apparently gone this direction, for reasons I don't understand.)

  • 207.145.125.204, 67.95.56.42, 204.202.15.102, and 204.202.23.184 all kept trying to send email with an origin address that had already tripped our spamtraps, mostly for what looks like phish spam (certain sorts of origin addresses are dead giveaways).
  • 206.223.168.238 is in the CBL.
  • 203.89.173.58 kept trying with a bad HELO.
  • 81.201.105.157 is in the NJABL.

All that makes this a highly atypical week; for example, we don't have a single top-10 IP address that we've seen before. In the good news front, 208.99.198.64/27 continued not sending us so much as a single connection attempt over the week, and have thus dropped off my radar for future reports.

Connection time rejection stats:

  69674 total
  43536 dynamic IP
  17981 bad or no reverse DNS
   6394 class bl-cbl
    295 class bl-njabl
    250 class bl-sdul
    220 class bl-pbl
    159 acceleratebiz.com
    147 class bl-sbl
    144 class bl-dsbl
     33 inetekk.com
     15 cuttingedgemedia.com

Overall volume is about the same as last week. The SBL breakdown is slightly interesting:

59 SBL51080 phish spam source
17 SBL49074 hijacked server that's spamming (13 Dec 2006)
11 SBL49046 advance fee fraud spam source (13 Dec 2006)
10 SBL50375 a /25 ROKSO listing for Eric Reinertsen (29 Jan 2007)
10 SBL49248 saigonnet.vn webmail, listed as an advance fee fraud spam source (18 Dec 2006)

Of these, SBL49046 and SBL50375 appeared in my summary last week, at about the same volume.

Three of the top 30 most rejected IP addresses were rejected 100 times or more this week: 193.4.194.142 (216 times, bad reverse DNS), 64.166.14.222 (168 times, dynamic IP), and 81.201.105.157 (153 times, on the NJABL). Eight of the top 30 are currently in the CBL, eight are currently in bl.spamcop.net, 10 are in the PBL, a grand total of 17 are in the combined zen.spamhaus.org zone, and one is in the SBL: 69.15.58.106, SBL51080.

This week Hotmail managed:

  • 4 messages accepted, two of them probably legitimate.
  • no messages rejected because they came from non-Hotmail email addresses.
  • 57 messages sent to our spamtraps.
  • 10 messages refused because their sender addresses had already hit our spamtraps.
  • 5 messages refused due to their origin IP address (3 from the Cote d'Ivoire, one from Nigeria, and one in the CBL).

And the final numbers:

what # this week (distinct IPs) # last week (distinct IPs)
Bad HELOs 877 101 979 155
Bad bounces 16 12 9 8

The winner of the bad HELO contest this week was 72.165.125.122, with 125 rejections until it got blocked; the next highest source only managed 61. It's sad to see the bad bounce numbers start rising again, but they're still low, and this week they seem to have come from all over, including a darpa.mil machine and something in the Arab Emirates that has been forging its HELO name and so won't be talking to us any more.

Bad bounces were sent to 13 different usernames this week, mostly to real ex-users and plausible usernames. There was one alphabetical jumble, and E07 and 3E4B also put in appearances. The most popular bad bounce targets (admittedly at 3 and 2 hits respectively) were both ex-users.

SpamSummary-2007-02-24 written at 01:11:28; Add Comment

2007-02-24

Thesis: any server push technology inevitably breeds spam

Consider various server push technologies, where things come to you instead of you having to seek them out: email, instant messaging, voice over IP phone server, and even text messaging on cell phones. All of them have spam problems (generally growing).

This is not a coincidence. Any server push technology will get overrun by spammers, because server push inherently gives them access to people and is thus very, very attractive. As a consumer of server push technology, your only recourse from the onslaught is to hide, to block, to filter; you can't actually get away.

(The push technology provider can't keep all the spammers out, if only because sooner or later some of them are in its own marketing department.)

Client pull technology is much more resilient. The spammers have to be attractive to get you to visit even once, then genuinely interesting to keep you around, and you can easily get away. Thus it is a feature, not a problem, that things like syndication feeds do not have a server push option.

(And indeed much of the spammer activity in client pull technology like the web is about being attractive, for example getting a high Google search rank for some valuable keywords.)

PushBreedsSpam written at 21:51:04; Add Comment

2007-02-18

Weekly spam summary on February 17th, 2007

This week, we:

  • got 15,925 messages from 244 different IP addresses.
  • handled 23,465 sessions from 1,341 different IP addresses.
  • received 244,268 connections from at least 75,016 different IP addresses.
  • hit a highwater of 16 connections being checked at once.

This is about the same as last week. The per day figures show some significant fluctuations:

Day Connections different IPs
Sunday 36,660 +13,133
Monday 37,139 +12,216
Tuesday 43,156 +12,833
Wednesday 36,296 +11,682
Thursday 31,349 +8,987
Friday 32,322 +8,878
Saturday 27,346 +7,287

Kernel level packet filtering top ten:

Host/Mask           Packets   Bytes
213.29.7.0/24         14878    892K
64.166.14.222         14215    682K
65.99.209.156         12430    682K
213.4.149.12           9316    484K
68.153.217.220         6508    312K
71.89.4.212            4907    235K
70.246.90.150          4413    212K
66.15.119.165          4186    196K
216.229.180.243        3695    177K
66.42.167.154          3136    150K

This is definitely down from last week, which is welcome, and for the first time in a while 213.4.149.12 (terra.es) is not at the top of the list.

  • 64.166.14.222, 213.4.149.12, 68.153.217.220, and 66.15.119.165 all return from last week.
  • 65.99.209.156 kept trying to send us spam that had already tripped our spamtraps.
  • 71.89.4.212 is a charter.com DHCP machine of some sort.
  • 70.246.90.150 kept trying with a bad HELO.
  • 216.229.180.243 kept trying to send what looks like phish spam with MAIL FROMs that had already hit our spamtraps.
  • 66.42.167.154 is in the SORBS DUL.

To my surprise, 208.99.198.64/27 totally disappeared; in contrast to their performance last week, this week we saw not so much as one packet from them. I would like to think that this is because they got disconnected, but I'm not that optimistic.

Connection time rejection stats:

  71169 total
  44825 dynamic IP
  17384 bad or no reverse DNS
   6398 class bl-cbl
   1004 class bl-sbl
    203 class bl-pbl
    201 class bl-njabl
    183 class bl-sdul
    177 class bl-dsbl
     81 cuttingedgemedia.com

Almost all of the SBL hits came from 69.42.169.0/24 (914 hits), listed as SBL50892 (spam source and landing pages, listed February 6th) and SBL50451 (colocentral.com spammer hosting, an escalation listing, also listed February 6th). They've showed up before, back in late January, where they were even more active.

(The next highest SBL listing only has 17 rejections; it is SBL49046, a free webmail place listed for (what else) advance fee fraud spamming. After that is SBL50375 (13 rejections, a Rokso-listed place), and SBL50928 (12 rejections, a hijacked server).)

Two out of the top 30 most rejected IP addresses were rejected 100 times or more this week; 64.166.14.222 (631 times) and 60.248.160.38 (109 times). Only 7 out of the top 30 most rejected IP addresses are currently in the CBL, none are currently in bl.spamcop.net, and 12 are in the Spamhaus PBL. One is currently in the SBL: 201.158.98.10 (50 rejections) is in SBL48034, a /21 listing of 'Suavemente LLC', listed February 5th.

This week's Hotmail score is:

  • 1 message accepted, almost certainly a legitimate one.
  • 3 messages rejected because they came from non-Hotmail email addresses, all from 'service_banc@msn.com'.
  • 34 messages sent to our spamtraps.
  • 1 message refused because its sender address had already hit our spamtraps.
  • 1 message refused due to its origin IP address being from SAIX aka telkom.co.za.

And the final numbers:

what # this week (distinct IPs) # last week (distinct IPs)
Bad HELOs 979 155 995 154
Bad bounces 9 8 12 8

I am amazed; apparently last week's low bad bounces was not just a one-time anomaly. Bad bounces were sent to only 7 different usernames this week, and interestingly all seven of them are accounts that used to exist here. Three bounces went to a relatively current domain name, two bounces went to a somewhat out of date domain name, and four went to an outdated hostname that is a strong spam and spam bounce signature these days.

SpamSummary-2007-02-17 written at 01:38:56; Add Comment

2007-02-10

Weekly spam summary on February 10th, 2007

This week, we:

  • got 15,405 messages from 262 different IP addresses.
  • handled 23,822 sessions from 1,467 different IP addresses.
  • received 258,033 connections from at least 76,977 different IP addresses.
  • hit a highwater of 7 connections being checked at once.

The overall volume is about the same as last week; technically it's up a bit, but I figure it's within the normal fluctuation levels by now.

Day Connections different IPs
Sunday 37,528 +13,308
Monday 44,276 +12,563
Tuesday 40,718 +10,913
Wednesday 30,813 +9,073
Thursday 38,067 +11,262
Friday 36,639 +10,185
Saturday 29,992 +9,673

It's interesting that the connection count doesn't seem to completely tied to the number of new IP addresses; the highs and lows don't match up, although there's a general correlation.

Kernel level packet filtering top ten:

Host/Mask           Packets   Bytes
208.99.198.64/27      44955   2696K
213.29.7.0/24         29284   1756K
213.4.149.12          18732    974K
64.166.14.222         12807    615K
193.70.192.0/24        8622    389K
66.15.119.165          6667    312K
68.149.160.108         6370    298K
206.100.222.95         5001    240K
68.153.217.220         4846    232K
69.15.68.98            4681    219K

Yow. Things are significantly up over last week, and we have a serious winner.

  • 208.99.198.64/27 is totallyfreeld.net. They used to be SBL-listed, but for some reason they got taken out, and apparently they wasted no time in opening up the floodgates.

  • 213.4.149.12 (terra.es), 64.166.14.222 (PacBell DSL), 66.15.119.165 (on the SORBS DUL), and 206.100.222.95 (bad HELOs) all return from last week.
  • 68.149.160.108 tried too many bad HELOs.
  • 68.153.217.220 is a Bellsouth ADSL IP that we consider dynamic.
  • 69.15.68.98 also had too many bad HELOs and returns from early January.

It's been quite a while since we had so many returning IPs, but the real standout is clearly 208.99.198.64/27 by a mile, beating even centrum.cz's 213.29.7.0/24 (itself well up over last week). Given that they somehow got out of the SBL, I am now very glad that I put in our own kernel-level blocks (and I have now made sure that they are listed in pretty much every level of block that we have, just in case).

Connection time rejection stats:

  73757 total
  45224 dynamic IP
  21356 bad or no reverse DNS
   5533 class bl-cbl
    221 class bl-sdul
    211 class bl-dsbl
    207 class bl-pbl
    101 class bl-njabl
     95 class bl-sbl

Things are distinctly up compared to last week, despite the not markedly higher overall connection count. As usual, everything except the CBL is relatively useless, although I suspect that the PBL and the SORBS DUL would jump significantly if we didn't already have our own blocks for those.

The two leading SBL listings were SBL50738, an advance fee fraud spam listing from this month (12 rejections) and SBL50181, a compromised Brazilian web server abused by advance fee fraud spammers since November (10 rejections, and we've seen it before).

Three of the top 30 most rejected IP addresses were rejected 100 times or more this week: 210.47.42.5 (259 times, bad DNS), 82.38.128.6 (143 times, dynamic IP), and 64.166.14.222 (127 times, 'dynamic' IP). 16 of the top 30 are currently in the CBL and 18 are currently in bl.spamcop.net.

This week Hotmail managed:

  • no messages accepted.
  • no messages rejected because they came from non-Hotmail email addresses.
  • 48 messages sent to our spamtraps.
  • 2 messages refused because their sender addresses had already hit our spamtraps.
  • 6 messages refused due to their origin IP address (3 from the Cote d'Ivoire, two from Gilat Satcom, and one in SBL50431).

And the final numbers:

what # this week (distinct IPs) # last week (distinct IPs)
Bad HELOs 995 154 982 113
Bad bounces 12 8 105 88

Apparently some sort of miracle happened this week and the spammers all stopped forging us. Alternately, my software is broken.

Bad bounces were sent to only 11 different bad usernames this week; 'E7D6' got two hits and everyone else got one. Bounces went to three hex bad usernames (E7D6, E07, and 3E4B), four actual ex-users, two things that could be valid usernames, and two random alphabetical jumbles. Bounces came from machines in Germany and Russia, among other places.

Colour me pleasantly happy and certainly hoping that this keeps up. But I'm not going to hold my breath.

SpamSummary-2007-02-10 written at 23:54:39; Add Comment

A temptation with challenge/response anti-spam systems

Every time I see a mail from a C/R system, I get more and more tempted to teach our mail filtering infrastructure about the most common ones, so that it can automatically acknowledge the challenges, discard the messages, and not bother the users with them at all.

Will this acknowledge a lot of spam, and thus dump it on the people operating those C/R systems? Sure, but that's not our problem. And I'd clearly be doing our users a service, especially if C/R systems get widespread.

(This is another example of how C/R systems try to work by offloading your spam problem on precisely the wrong people. The only way they can 'work' at all is if most of the mail addresses you challenge don't even exist; otherwise you are reaching either spammers or pissed off people, neither of which have your interests in mind.)

As a special bonus prize, I could even hack our system to do this even for local addresses that don't actually exist, since it's perfectly possible to automatically acknowledge the challenge and 5xx the DATA command at the end of the SMTP conversation. I'd have to make sure that this only happened for single-recipient email, but that describes all of the C/R email I'd want to do this to.

(Ob-attribution-darnit: I've had this thought for a while, but the impetus to actually write this entry was provided by reading about a related temptation with C/R systems here.)

CRTemptation written at 21:41:12; Add Comment

2007-02-03

Weekly spam summary on February 3rd, 2007

This week, we:

  • got 15,790 messages from 280 different IP addresses.
  • handled 23,657 sessions from 1,340 different IP addresses.
  • received 248,408 connections from at least 73,118 different IP addresses.
  • hit a highwater of 17 connections being checked at once.

Volume is up again from last week, although the number of different IPs is down slightly.

Day Connections different IPs
Sunday 28,871 +11,587
Monday 30,772 +10,424
Tuesday 39,487 +10,941
Wednesday 38,430 +10,523
Thursday 36,188 +9,602
Friday 37,864 +10,746
Saturday 36,796 +9,295

This is somewhat more even than last week, but that's about all I can say for it.

Kernel level packet filtering top ten:

Host/Mask           Packets   Bytes
193.70.192.0/24       18193    820K
213.4.149.12          17817    926K
213.29.7.0/24         17387   1043K
193.95.28.40          14077    653K
64.166.14.222         10431    501K
203.143.22.50          7058    423K
24.39.78.164           6715    322K
206.100.222.95         6082    292K
66.15.116.230          5391    259K
66.15.119.165          4741    222K

Things are definitely up compared to last week.

  • 213.4.149.12 and 66.15.119.165 return from last week.
  • 193.95.28.40 kept attempting to send us stuff that had already tripped spamtraps.
  • 64.166.14.222 returns from early January, still blocked for being a PacBell DSL line.
  • 203.143.22.50 is a Sri Lankan IP address with no reverse DNS.
  • 24.39.78.164 and 206.100.222.95 both tried too often with bad HELOs.
  • 66.15.116.230 is on the NJABL.

Connection time rejection stats:

  64250 total
  39581 dynamic IP
  17883 bad or no reverse DNS
   5133 class bl-cbl
    333 class bl-dsbl
    166 class bl-njabl
    139 class bl-pbl
    123 class bl-sbl
    116 class bl-sdul
     21 verticalresponse.com
     13 cuttingedgemedia.com

Four of the the top 30 most rejected IPs were rejected 100 times or more this week: 81.51.108.120 (349 times), 64.166.14.222 (199 times), 68.91.134.69 (118 times), and 211.180.132.9 (100 times). The first three were rejected as dynamic IPs, the fourth for having bad reverse DNS. Ten of the top 30 are currently in the CBL and a whopping 21 are currently listed in bl.spamcop.net.

This week's Hotmail scores are:

  • 5 messages accepted.
  • 1 message rejected because it came from a non-Hotmail email address.
  • 36 messages sent to our spamtraps.
  • 2 messages refused because their sender addresses had already hit our spamtraps.
  • 8 messages refused due to their origin IP address (3 in the SBL, 2 from the Cote d'Ivoire, 1 in the CBL, 1 from Nigeria, and one from SAIX).

Somehow, I don't think we're losing anything by not accepting an email message this week from one 'netaleloto_awrd_006@hotmail.it'.

The SBL listings are SBL50384, from January 2007, SBL46422, from September 2006, and SBL32972, from November 2005, when it was spamming through Hotmail. I have no words.

And the final numbers:

what # this week (distinct IPs) # last week (distinct IPs)
Bad HELOs 982 113 1171 134
Bad bounces 105 88 229 130

Germany and Russia seem to be the leading sources of bad bounces this week, with the usual contributions from various other places. Unlike last week, there's no particularly big single source; like last week, the most common bad usernames continue to be alphabetical jumbles, with a certain amount of more plausible ones mixed in. Bad bounces were sent to 96 different bad usernames this week.

SpamSummary-2007-02-03 written at 23:44:45; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.