How the IPs sending us malware break down (January 2018 edition)
I recently wrote about how a misbehaving SMTP sender fooled me about some malware volume because it kept retrying the same message over and over despite getting permanent SMTP rejections. This made me interested in getting some numbers on how our malware rejections break down in terms of how many repeats, how many sources, and so on. All of the following figures are for roughly the past four and a half weeks.
The starting figure is that over this time we've done 2,729 rejections for malware and viruses. About 175 of these are have the same sending IP, sender address, destination address, and detected malware as other entries, making it very likely that they're resends. The most active resent message was rejected 97 times (that was the one with an ISO); after that we had one rejected 10 times (with 'CXmail/OleDl-AG'), one 8 times (with 'CXmail/OleDl-AL'), four that were rejected 3 times, and a whole fifty five that were rejected twice.
The resends came from 15 different IPs, including two other mail servers at the university; since these mail servers work properly, the 'resent' messages were actually more or less duplicated messages. It's possible that they originally came from different source IPs. Overall it seems that bad SMTP servers that resend in the face of permanent SMTP rejections are pretty uncommon.
(Since I'm blindly looking at messages across a very wide time range, it's possible that a number of the other 'resends' are really duplicate messages created by long-lived malware with insufficient variety in its sender addresses. Over four weeks, it's certainly possible that such malware would revolve around to targeting some of our addresses a second time.)
These 2,729 rejections came from only 124 different IP addresses (including a number of other mail systems at the university), with much of the volume coming from some very active sources:
649 18.104.22.168 443 22.214.171.124 227 126.96.36.199 210 188.8.131.52 196 184.108.40.206 97 220.127.116.11 [...] 41 18.104.22.168
22.214.171.124/24 is SBL387172 and 126.96.36.199/24 is SBL387171, both listed as 'suspected snowshoe spam' ranges. A few other less active IPs are in the CSS, SBL388761, and SBL383008. Somewhat to my surprise, only 24 of the IPs are currently in the XBL, although many of the earlier senders may have aged out.
(The true volume of malware from these SBL listed IPs is likely
to be clearly higher than this, since some of their email will
have been rejected at the
RCPT TO phase.)
Only eight of the IPs sent us more than one type of malware, and a number of them are other mail systems that are forwarding email to some of our users and thus are aggregating a number of different real sources together. The 188.8.131.52/24 block sent us 'Mal/Phish-A' and 'Troj/Phish-BPZ'; the 184.108.40.206/24 block sent us 'Mal/Phish-A'.
(Since these were detected as malware, they were almost certainly HTML files as attachments, which is the pattern we've seen.)
However, many of the active sources tried to send email to quite a lot of different addresses here, as shown by a top five count:
140 220.127.116.11 99 18.104.22.168 74 22.214.171.124 64 126.96.36.199 61 188.8.131.52
This is basically the pattern I would expect from spam sending operations, which is what these are. Aggregated together, 184.108.40.206/24 tried to send to 203 different addresses and 220.127.116.11/24 tried to send to 110. A lot of the destination addresses were targeted repeatedly in both cases.
With one exception, the most popular sender addresses were random GMail addresses like 'email@example.com'. The exception is 'firstname.lastname@example.org', which was used for 27 messages from 21 different IPs, all trying to deliver a 'CXmail/OleDl-V' malware payload. Overall there were 1,610 different sender addresses, but 1,559 of them were GMail addresses.
(I was going to say that none of these would pass GMail's DMARC policy, but apparently Google blinked on their plans for a strict one. Right now GMail still publishes a 'p=none' DMARC policy that doesn't ask people to reject email that fails to pass DKIM and/or SPF tests.)
A misbehaving SMTP sender can fool me about malware volume
One of the things that I do is casually watch our external mail gateway's logs of mail rejections due to viruses and malware. When unusual things pop up, I then tend to look at what attachment file types were associated with this, in case something unusual pops up, or just things we want to block explicitly. For the past couple of weeks I've been seeing a reasonable number of rejections for what Sophos PureMessage is calling 'CXmail/IsoDl-A' and our attachment type logger was recording as:
attachment application/x-iso9660-image; MIME file ext: .iso; tar no files?!
(The 'tar no files?!' portion means that this was identified and
accepted as a tar file by Python's
tarfile module but that it contained
no files. I don't know why this is happening.)
That we were seeing a reasonable number of these seemed interesting
and maybe a bit alarming. Distributing malware in ISO images is bit
eyebrow raising, but who knows, and of course I'd like for our tools
to be able to see inside them (or at least figure out why they have
an empty tarfile). So today I started some work on improving the
situation and looked at our logs in more detail. When I did that,
something jumped out at me: almost all of them were from the same
IP address. Further, when I scanned some additional logs, everything
I looked at had the same
MAIL FROM and was all to the same email
In other words, what we have here is another case of a sending system that thinks permanent failures are only temporary. The IP address in question (18.104.22.168, lweb1.slnet.com.au) responds on port 25 with a Postfix banner, so they may have unwisely set the Postfix soft_bounce option, or perhaps their web panel did it for them.
(Since the message headers Exim logged talk about 'Roundcube Webmail', one may guess that these people are running an insecure and more or less unmonitored webmail environment.)
Each of the re-delivery attempts of this particular message got logged separately, of course, since they were separate SMTP transactions and separate rejections. This inflated the log volume, making it look like there were a lot more ISO attachments and CXmail/IsoDl-A activity than there actually is.
(This IP is not the only source of messages with CXmail/IsoDl-A, but there aren't very many others and they do go away when we give them SMTP rejections.)
One of the lessons I draw from this is that perhaps I should write some scripts to generate systematic summary reports from our logs. If I did that, I could invest the effort to have the script look for this sort of resending and account for it, so I got a better picture of the real activity level of various sorts of malware and so on.
PS: TLS certificates are an interesting new vector for determining who is using a particular IP address (or at least determining one set of people). It's handy that they're increasingly prevalent. Of course this may not be the responsible party, to the extent that there is any party that considers themselves 'responsible' for any particular behavior of the machine.
Attachment types that we see in email from Zen-listed IP addresses
As part of yesterday's entry, I broke down what percent of various sorts of attachments we received came from IPs listed in zen.spamhaus.org. Today I'm going to basically invert the question and ask instead what sort of attachment types we get sent from Zen-listed IPs. As before, I'm going to be using the past nine weeks and a bit of logs, because our weekly log rotation makes it easy to do that.
(Because our attachment type logging comes after our
RCPT TO time
rejections, this is all based on email to people who don't reject
all email from Zen-listed IPs.)
First up we have a collection of attachment types without MIME file names (and thus without MIME file extensions). For these I have to rely on the declared MIME types and sniffed file type information, and they break down like this:
102 [Word XML] 13 application/msword 11 image/jpeg 10 message/rfc822 10 application/xml [either Word or Excel] 6 [Excel XML] 1 image/gif
Possibly this means that I should recurse inside message/rfc822 MIME parts. Some of these were file attachments; others I believe were the sole component of the email message.
Of attachments with MIME file names, the type breakdown is:
1032 .doc 623 .docx 576 .html 545 .htm 308 .pdf 253 .zip 248 .xlsx 170 .xls 109 .jar 90 .jpg 83 .aspx 61 .7z 48 .ace 26 .r11 22 .xz 20 .gz 19 .tar 15 .r00 .png 14 .gif 11 .rar 10 .pdf.gz 6 .iso .arj 4 .pdf.z 3 .txt .jpeg .chm 1 .rtf .r01 .ppsx .lzh .bat
On the one hand, this is a broad assortment with a long tail. On
the other hand, there's some very popular attachment types, especially
Microsoft Word documents. I suspect that if I ground through our
logs to cross-correlate them, I'd discover that a lot of these were
seen as malware. Based on past discoveries, the
.htm are likely phish spam, perhaps with some malware
(All of the
.rars that we could successfully examine had
in them and got rejected on that basis.)
Those .zip archives break down as containing:
114 .exe 86 .zip 26 .jar 25 .vbs 1 ".lnk .txt" 1 .com
We rejected all the ZIP archives with
The inner .zip files are:
84 .doc 1 .scr 1 .js
It turns out that we rejected all of these. The
files got rejected by a generic 'in-zip' case, which is perfectly
happy to match nested zips as well as plain zips. The doubly nested
.doc files have been rejected for some time.
(It turns out that the few nested ZIPs in yesterday's entry that weren't from Zen-listed IPs
must have all been
PS: That I keep having to check what we're actually rejecting suggests that our attachment type rejection rules are now sufficiently complicated that I should actually write them down, instead of leaving them sort of implicitly documented in the Exim configuration and then trying to remember them (it turns out that I got at least one case wrong in yesterday's entry). Possibly this will cause us to regularize some of them. Probably we won't drop any.
What file types we see inside ZIP archives with only a single file in email
Earlier this year, I wrote about how email attachments of a single .zip inside another zip were suspicious, and then did a file type breakdown for them. Malware is ever-mutating so nested ZIP archives have gone out of style by now, but instead we're seeing a not insignificant amount of attachments that are ZIP archives with only a single file in them. Today I want to do a breakdown of those file types that we've seen over the past nine weeks or so.
So here are the raw numbers; for some types I've added the percent that were received from IPs listed in zen.spamhaus.org at the time:
434 .exe (26%) 91 .zip (94%) 86 .jar (30%) 43 .vbs (58%) 13 .bat (0%) 6 .com (17%, ie 1) 3 .wsf .pdf 2 .scr .py .js 1 .xls .rkt .rar .pot .jse .hta .eml .docx .csv
(Overall, about 36% of these messages were from IPs listed in zen.spamhaus.org at the time.)
Some of these we reject immediately these days, such as the
.wsf cases. Others we probably should, like
.bat (which we already reject as top-level attachments).
The single nested
.zip cases break down like this:
89 inner zip exts: .doc 1 inner zip exts: .scr 1 inner zip exts: .js
It's somewhat interesting to me that in all cases, there's only a
single file inside the inner zip. Because of past events, we also reject the doubly nested
files. We'll also reject the doubly-nested
.js attachment (because
.js inside a ZIP archive, even a nested one), but not the
Unfortunately, what stands out in this list is the nested
files. Partly this is because these days Sophos PureMessage is
identifying all of them as malware, for example CXmail/JarZip-A
(which we saw in an epic case) and also
Not a single one is making it through all of our anti-spam and
anti-virus filtering to reach our users as presumed legitimate
(It's possible that this identification by Sophos is generic and means very little more than 'a single .jar file inside a .zip with some vague additional threat markers'. This doesn't matter in practice, since the net effect is the same.)
PS: As you might suspect, this entry came about because I noticed
.zips were being rejected as malware and then
decided to go look at the numbers to see if we should be rejecting
them immediately. My current answer is that we probably should be,
along with some other rejections, although there are arguments
against this reaction.
Sometimes the right thing to do about a spate of spam is nothing (probably)
We have a program
to capture information about what sort of email attachments our
users get. As part of its operation it
tries to peer inside various types of archive files, because you
can find suspicious things there (and,
of course, outright bad things, some
of them surprising). This program is written
in Python, which means that its ability to peer inside types of
archive files is limited to what I can find in convenient Python
packages. One of the archive formats that it can't look inside right
Stretching through August, September, and October we received a
drizzle of email messages with .7z attachments that our commercial
anti-spam system labeled as various sorts of 7z-based malware (a
typical identification was CXmail/7ZDl-B).
I rather suspect that if our program had the ability to peer into
7z archives it would have found one of the file types that we now
block, such as
.exe files. While
the drizzle was coming in, it was frustrating to sit there with no
visibility inside these 7z archives. I was definitely tempted by
the idea of using one somewhat complicated option to add 7z support to the
I didn't, though. I sat on my hands. And now the drizzle of these 7z-attachment malware emails has gone away (we haven't seen any for weeks now). I'm pretty sure that I made the right decision when I decided on inaction. There are a number of reasons for this, but the narrow tactical one is simply that this format of malware appears to have been temporary (and our commercial anti-spam system was doing okay against it). Waiting it out meant that I spent no time and effort on building an essentially permanent feature to deal with a temporary issue.
(I admit that I was also concerned about security risks in libarchive, since parsing file formats is a risk area.)
It's become my view that however tempting it is to jump to doing something every time I see spam, it's a bad practice in the long run. At a minimum I should make sure that whatever I'm planning to do will have a long useful lifetime, which requires that the spam it's targeting to also have a long lifetime. Adding features in response to temporary spammer and malware behaviors is not a win.
(The other thought is that although some things are technically nifty, maybe there is a better way, such as just rejecting all .7z files. This would need some study to see how prevalent probably legitimate .7z files are, but that's why we're gathering attachment the information in the first place.)
PS: Yes, this malware could come back too; that's why this is only probably the right decision.
Delays on SMTP replies discourage apparently SMTP AUTH scanners
One of the things that my sinkhole SMTP server can do is trickle out all of its SMTP replies at a rate of one character every tenth of a second. This is a feature that I blatantly stole from OpenBSD's spamd when I read about it years ago, mostly because it seemed like a good idea. The effects of doing this have been interesting, because there's a real split in the results.
At the moment I have my sinkhole server listening on two IP addresses. One IP address is the MX target for some long-standing host names that collect a bunch of spam. The other IP address used to be the MX target for one such host, but that ended several months ago; it's run a SMTP server for years, though. As far as I've seen, all spam email attempts come to the first IP address and those senders are generally completely indifferent to the slow output they're getting. I'm not really surprised by this; real mail software is going to be willing to cope with relatively slow responses because there are all sorts of reasons they can happen, including overloaded destination systems. Even if you're writing your own software, you have to go out of your way to have timeouts so fast that they'd trigger on my responses.
(There was a day when some spam software working through compromised end machines and open proxies would respond to slow 'tarpit' responses by disconnecting, but I'm not convinced anything does that any more. However I haven't attempted to do any real study of this, partly because it's hard to sort out noise from signal in my current setup.)
Things are very different on the second IP, the one that isn't an
MX target of anything currently but which used to run a SMTP server.
There, there is a huge number of IP addresses that connect and then
disconnect before my greeting banner even finishes trickling out.
When I've turned my slow replies tarpit off to see what these
connections looked like, they've all turned out to behave like SMTP
AUTH scanners. Some of them will
EHLO (often with with suggestive
names) and then disconnect when they don't see AUTH
advertised in my
EHLO greeting; others will go through the
and then start blindly spitting
AUTH LOGIN requests at me. Not
infrequently they'll do this repeatedly, or with a lot of concurrent
connections, or both.
(To my surprise I don't think any of them ever tried
which I do advertise and which SMTP authentication is often gated
We have a very simple connection time delay in our external mail gateway and now that I check, we are seeing a certain amount of Exim messages about 'unexpected disconnection while reading SMTP command from ...'. However, I'm not sure how far these connections have gotten before then. They do generally seem to have hostnames typical of end-user IPs, just like the connections I see to my second IP, so it's possible that these are attempts at SMTP AUTH probing.
(We probably don't care enough to bother adding logging that would tell us. It's not exactly actionable information; everyone with exposed IPs and services on the Internet is getting probed all the time.)
We've now seen malware in a tar archive
Our anti-spam system recently logged the following information about an incoming message:
1e92K6-0007yD-37 attachment application/octet-stream; MIME file ext: .tar; tar exts: .exe; email is in-dnsbl
rejected 1e92K6-0007yD-37 from email@example.com to <redacted>: identified virus: Mal/DrodTar-A
Sophos's information on Mal/DrodTar-A
plus some Internet searches
suggest that this
.exe had attributes of a relatively generic Windows
We also logged the message headers, and they make it clear that this
wasn't a case of someone wrapping up a malware sample in a tar file in
order to mail it to one of our people for research; it was an honest to
goodness piece of Windows malware trying to propagate itself in a tar
Tell-tale headers include:
From: "Account Manager"<firstname.lastname@example.org>
Subject: Purchase Order
(Our users do get emailed .tar and .tar.gz files, but they have actual contents and they're hopefully not showing up in email that looks like that.)
It turns out that this is not the first .exe-in-.tar attachment we've
seen in the past several months; back in May and June we saw a few that
were identified (at the time) as Mal/FareitVB-M.
More recently we saw a couple that sadly weren't identified as malware,
so they sailed right through the mail system (I suspect that they were
malware; a single
.exe file in a
.tar is unusual, and most of our
.tar attachments are actually
On further inspection we've also seen a number of other plain
attachments that seem to be malware, based on what they contain.
In addition to
.exes, we've logged single
some of which have also been identified as Mal/FareitVB-M. Probably
this means we should extend our rejection of bad things in ZIP archives
to also cover bad things in tar archives.
(All this goes to show that things can be hiding under innocent looking rocks.)
I'm a little bit surprised that Windows malware distributes itself as tar archives, because I would have thought that not many Windows machines can actually extract them without having to go find additional software. However, I may be wrong about this; some searches suggest that common Windows archive handling programs (such as 7-zip) are sufficiently polymorphic that they'll also unpack tar archives for you. Perhaps the malware authors have discovered that malware packed up in tar archives gets through defenses slightly more readily than malware in ZIP archives.
(Sadly, this is certainly the case here, where we'd have immediately rejected these attachments if they'd been ZIP archives instead of tarballs.)
I guess I'm a little bit sad and disappointed that tar archives are now being exploited by malware, in a 'is nothing sacred?' kind of way.
(Where malware and spam in general is concerned, the answer has always been 'of course not'. But I still like to think of Unix things as existing in a separate world, one not contaminated by the grubby realities of the modern malware-in-email environment.)
I've now seen something doing SMTP probing of IPv6 addresses
One of the machines that I run my sinkhole SMTP server on has an IPv6 address. This address is present in DNS, but wasn't directly visible as the target of an MX record or anything else that would lead it to clearly being associated with email. To my surprise, yesterday a machine connected to my sinkhole SMTP server on this relatively obscure IPv6 address.
(This machine is the MX target of an old hostname that spammers and other people have latched on to, but the MX target didn't have an IPv6 address, just an IPv4 one.)
The source IPv6 is 2607:ff10:c5:509a::1 in cari.net, and an Internet
search found an interesting report about it,
which seems vaguely sloppy given how easy it usually is to change
IPv6 addresses. The actual activity I saw appears to have been TLS
probes; on its first two connections, it
STARTTLS'd with different
ciphers and then abandoned the connection after TLS had started.
EHLOs were used too, first 'k7wyLkmlLdInG.com' and then
(The first connection used ECDHE-RSA-AES256-GCM-SHA384, a TLS v1.2 cipher; the second used the much older ECDHE-RSA-AES256-SHA, originally from SSLv3.)
Looking at my logs, I've seen similar TLS probes with similar
(especially 'openssl.client.net') from a cari.net IPv4 address,
22.214.171.124. This has a PTR record of 'burger.census.shodan.io',
although the IP address listed for that name doesn't match. If this
is a Shodan source point, SMTP TLS scanning isn't particularly
surprising in general (although it didn't work very well against
my sinkhole SMTP server). It does surprise me that people are clearly
trying IPv6 addresses for this, presumably by crawling DNS to find IPv6
addresses and then probing all ports on them just to see.
(Checking my logs, I see that my SSH daemon refused to talk to 2607:ff10:c5:509a::1 at around the same time, so this is probably port scanning and probing in general and may well be Shodan. Shodan once exploited NTP to find active IPv6 addresses, and may be back to this sort of tricks.)
Going back further in my SMTP logs, I see that 126.96.36.199 aka 'census3.shodan.io' also did this sort of probing. So perhaps Shodan has turned its unblinking eye on my corner of the network world in general, and the IPv6 probes are just a manifestation of this. Sadly that makes them less interesting and means that I've yet to actually encounter a spammer trying to use IPv6. Maybe someday.
Spam issues need to be considered from the start
A number of Google's issues from the spammer I talked about yesterday come down to issues of product design, where Google's design decisions opened them up to being used by a spammer. I considered these issues mistakes, because they fundamentally enable spammers, but I suspect that Google would say that they are not, and any spam problems they cause should get cleaned up by Google's overall anti-spam and systems that watch for warning signs and take action. Well, we've already seen how that one works out, but there's a larger problem here; this is simply the wrong approach.
In a strong way, anti-spam considerations in product design are like (computer) security. We know that you don't get genuinely secure products by just building away as normal and then bringing in a security team to spray some magic security stuff over the product when it's almost done; this spray-coated security approach has been tried repeatedly and it fails pretty much every time. The way you get genuinely secure products is considering security from the very start of the product, when it is still being designed, and then continuing to pay attention to security (among other things) all through building the product, at every step along the way. See, for example, Microsoft's Security Development Lifecycle, which is typical of the modern approach to building secure software.
(That you need to take a holistic approach to security is not really surprising; you also need to take a holistic approach to things like performance. If no one cares about performance until the very end, you can easily wind up digging yourself into a deep performance hole that is very painful and time-consuming to get out of, if it's even feasible to do so.)
Similarly, you don't get products that can stand up to spammers by
designing and building your products without thinking about spam,
and then coming along at the end to spray-coat some scanning and
monitoring magic on top and add an
abuse@... address (or web
form). If you want products that will not attract spammers like
ants to honey, you need to be worrying about how your products could
be abused right from the start of their design. By now the basics
of this are not particularly difficult, because we have lots of
painful experience with spammers (eg).
Google is objectively running a spammer mailing list service
If you are a mailing list service provider, there are a number of things that you need to do, things that fall under not so much best practices as self defense. My little list is:
- You shouldn't allow random people you don't know and haven't carefully authenticated to set up mailing lists that you'll send out for them.
- If you do let such people set up mailing lists, you should require that all email addresses added to them be explicitly confirmed with the usual two step 'do you really want to subscribe to this' process.
- If you actually allow random people you don't know to add random email addresses to their mailing lists, you absolutely must keep a very close eye on the volume of rejections to such mailing lists. A significant rate of rejections is an extremely dangerous warning sign.
Google, of course, does none of these, perhaps because doing any of these would require more people or reduce 'user engagement', also known as the number of theoretical eyeballs that ads can be shown to. The result is predictable:
2017-10-04 08:19 H=mail-io0-f199.google.com [188.8.131.52] [...] F=<emails1+[...]@offpay.party> rejected [...] 2017-10-04 08:26 H=mail-ua0-f200.google.com [184.108.40.206] [...] F=<emails5+[...]@offpay.party> rejected [...] 2017-10-04 08:31 H=mail-vk0-f71.google.com [220.127.116.11] [...] F=<emails7+[...]@offpay.party> rejected [...] 2017-10-04 08:31 H=mail-pf0-f198.google.com [18.104.22.168] [...] F=<emails7+[...]@offpay.party> rejected [...] 2017-10-04 08:32 H=mail-qk0-f198.google.com [22.214.171.124] [...] F=<emails8+[...]@offpay.party> rejected [...] 2017-10-04 08:39 H=mail-qk0-f199.google.com [126.96.36.199] [...] F=<emails9+[...]@offpay.party> rejected [...] 2017-10-04 08:40 H=mail-it0-f70.google.com [188.8.131.52] [...] F=<emails9+[...]@offpay.party> rejected [...] 2017-10-04 08:40 H=mail-io0-f200.google.com [184.108.40.206] [...] F=<emails11+[...]@offpay.party> rejected [...] 2017-10-04 08:40 H=mail-io0-f197.google.com [220.127.116.11] [...] F=<emails11+[...]@offpay.party> rejected [...] 2017-10-04 08:41 H=mail-ua0-f197.google.com [18.104.22.168] [...] F=<emails11+[...]@offpay.party> rejected [...] 2017-10-04 11:57 H=mail-vk0-f69.google.com [22.214.171.124] [...] F=<emails15+[...]@offpay.party> rejected [...] 2017-10-04 12:06 H=mail-pg0-f71.google.com [126.96.36.199] [...] F=<emails16+[...]@offpay.party> rejected [...] 2017-10-04 12:09 H=mail-qt0-f200.google.com [188.8.131.52] [...] F=<emails18+[...]@offpay.party> rejected [...]
That's just from today; we have more from yesterday, October 2nd,
and October 1st. They're a mixture of
RCPT TO rejections (generally
due to 'no such user') and post-
DATA rejections from our
commercial anti-spam system laughing very
loudly at the idea of accepting the email. Many other copies made
it through, not because they weren't seen as spam but because they
were sent to users who hadn't opted into our SMTP time spam rejection.
Google has deliberately chosen to mix all of its outgoing email into one big collection of mail servers that third parties like us can't easily tell apart. For Google, this has the useful effect of forcing recipients to choke down much of Google's spam because of GMail, instead of letting people block it selectively. In this case, we have some trapped mail headers that suggest that this is something to do with Google Groups, which is of course something that we've seen before, with bonus failures. That was about six years ago and apparently Google still doesn't care.
(Individual people at Google may care, and they may be very passionate about caring, but clearly the corporate entity that is Google doesn't care. If it did care, this would not happen. At a minimum, there would be absolutely no way to add email addresses to any form of mailing list without positive confirmation from said addresses. Instead, well, it's been six years and this stuff is still happening.)
PS: My unhappy reactions yesterday on Twitter may have produced some results, which is better than nothing, but it should go without saying that that's not exactly a good solution to the general issue. Spammers are like ants; getting rid of one is simply dealing with the symptoms, not the problems.
If spam false positives are inevitable, should we handle them better?
Right now, our mail system basically punts on handling false positives (non-spam detected as spam). The only thing that users can do about false positives is turn off SMTP time rejection (if they have it turned on) and then fish the mis-classified message out of their filters or our archive of all recent email they've gotten. If the message has already been rejected, the only thing that can be done is to get the sender to re-send it. And there's no way for users to see what messages have been rejected, so they can tell if some important email message has fallen victim to a false positive; instead we periodically get requests to check our logs.
My impression is that our mail system's behavior is not atypically bad, and instead that plenty of other mail systems behave in much the same way. It's pretty straightforward to see why, too; it would take significantly more work to engineer anything more than this, especially if you reject at SMTP time (and I think you want to, because that way at least people find out that their email hasn't gone through because of a false positive). But probably we should do better here, if only because this is a pain point for our users (it's one of the things that gets them to talk to us about our spam filtering).
(This is also probably required if we accept the idea that 'spam levels' may be a copout.)
In general, a mail system do things with potential false positives
from two sides. It can give the local receiving user some tools to
investigate situations, answering the question of 'did X try to
send me email and what happened to it', and perhaps also to retrieve
such mis-classified email. Retrieving email rejected at SMTP time
requires your mailer to save a copy of such email (at least for a
while), which means you need to defer rejections to
This opens up a complicated tangle of worms for messages sent to
multiple recipients (although they go away again if you mandate
that you only have one 'spam level' and
everyone gets it).
Your mailer can also give senders some tools they can use to cause false positive messages to get accepted anyway. You probably don't want to offer these tools to all senders; sure, most spammers aren't paying attention to you, but some spam (such as advance fee fraud attempts) does come from real human beings doing things to compromised mail systems by hand, and they might well take advantage of your generosity. However, if someone's regular correspondent has some email classified as spam, it's probably safe (and worthwhile) to offer them these tools. The odds are probably good that it's an accident as opposed to a compromised account with a human being at the other end to take advantage of you.
(There are a wide variety of options for how to let people psuh
messages through. You could do 'mail it to this special one-time
address', or 'include this magic header or
Subject marker', or
just 'visit this URL of ours to request that your email get delivered
anyway'. And I'm sure there's more. I have no idea which option
would work best, and SMTP-time rejection makes things complicated
because it's hard to give people much information.)
None of these are particularly easy to put together with off the shelf components, though, which for many places is probably going to make the entire issue moot. And maybe things should be left as they are for the straightforward reason that a low level of false positives just doesn't justify much sysadmin effort to improve the situation, especially if it requires a complicated bespoke custom solution.
(This is one of the entries where it turns out that I don't have any firm conclusions, just some rambling thoughts I want to write down.)