Sometimes the right thing to do about a spate of spam is nothing (probably)
We have a program
to capture information about what sort of email attachments our
users get. As part of its operation it
tries to peer inside various types of archive files, because you
can find suspicious things there (and,
of course, outright bad things, some
of them surprising). This program is written
in Python, which means that its ability to peer inside types of
archive files is limited to what I can find in convenient Python
packages. One of the archive formats that it can't look inside right
Stretching through August, September, and October we received a
drizzle of email messages with .7z attachments that our commercial
anti-spam system labeled as various sorts of 7z-based malware (a
typical identification was CXmail/7ZDl-B).
I rather suspect that if our program had the ability to peer into
7z archives it would have found one of the file types that we now
block, such as
.exe files. While
the drizzle was coming in, it was frustrating to sit there with no
visibility inside these 7z archives. I was definitely tempted by
the idea of using one somewhat complicated option to add 7z support to the
I didn't, though. I sat on my hands. And now the drizzle of these 7z-attachment malware emails has gone away (we haven't seen any for weeks now). I'm pretty sure that I made the right decision when I decided on inaction. There are a number of reasons for this, but the narrow tactical one is simply that this format of malware appears to have been temporary (and our commercial anti-spam system was doing okay against it). Waiting it out meant that I spent no time and effort on building an essentially permanent feature to deal with a temporary issue.
(I admit that I was also concerned about security risks in libarchive, since parsing file formats is a risk area.)
It's become my view that however tempting it is to jump to doing something every time I see spam, it's a bad practice in the long run. At a minimum I should make sure that whatever I'm planning to do will have a long useful lifetime, which requires that the spam it's targeting to also have a long lifetime. Adding features in response to temporary spammer and malware behaviors is not a win.
(The other thought is that although some things are technically nifty, maybe there is a better way, such as just rejecting all .7z files. This would need some study to see how prevalent probably legitimate .7z files are, but that's why we're gathering attachment the information in the first place.)
PS: Yes, this malware could come back too; that's why this is only probably the right decision.
Delays on SMTP replies discourage apparently SMTP AUTH scanners
One of the things that my sinkhole SMTP server can do is trickle out all of its SMTP replies at a rate of one character every tenth of a second. This is a feature that I blatantly stole from OpenBSD's spamd when I read about it years ago, mostly because it seemed like a good idea. The effects of doing this have been interesting, because there's a real split in the results.
At the moment I have my sinkhole server listening on two IP addresses. One IP address is the MX target for some long-standing host names that collect a bunch of spam. The other IP address used to be the MX target for one such host, but that ended several months ago; it's run a SMTP server for years, though. As far as I've seen, all spam email attempts come to the first IP address and those senders are generally completely indifferent to the slow output they're getting. I'm not really surprised by this; real mail software is going to be willing to cope with relatively slow responses because there are all sorts of reasons they can happen, including overloaded destination systems. Even if you're writing your own software, you have to go out of your way to have timeouts so fast that they'd trigger on my responses.
(There was a day when some spam software working through compromised end machines and open proxies would respond to slow 'tarpit' responses by disconnecting, but I'm not convinced anything does that any more. However I haven't attempted to do any real study of this, partly because it's hard to sort out noise from signal in my current setup.)
Things are very different on the second IP, the one that isn't an
MX target of anything currently but which used to run a SMTP server.
There, there is a huge number of IP addresses that connect and then
disconnect before my greeting banner even finishes trickling out.
When I've turned my slow replies tarpit off to see what these
connections looked like, they've all turned out to behave like SMTP
AUTH scanners. Some of them will
EHLO (often with with suggestive
names) and then disconnect when they don't see AUTH
advertised in my
EHLO greeting; others will go through the
and then start blindly spitting
AUTH LOGIN requests at me. Not
infrequently they'll do this repeatedly, or with a lot of concurrent
connections, or both.
(To my surprise I don't think any of them ever tried
which I do advertise and which SMTP authentication is often gated
We have a very simple connection time delay in our external mail gateway and now that I check, we are seeing a certain amount of Exim messages about 'unexpected disconnection while reading SMTP command from ...'. However, I'm not sure how far these connections have gotten before then. They do generally seem to have hostnames typical of end-user IPs, just like the connections I see to my second IP, so it's possible that these are attempts at SMTP AUTH probing.
(We probably don't care enough to bother adding logging that would tell us. It's not exactly actionable information; everyone with exposed IPs and services on the Internet is getting probed all the time.)
We've now seen malware in a tar archive
Our anti-spam system recently logged the following information about an incoming message:
1e92K6-0007yD-37 attachment application/octet-stream; MIME file ext: .tar; tar exts: .exe; email is in-dnsbl
rejected 1e92K6-0007yD-37 from email@example.com to <redacted>: identified virus: Mal/DrodTar-A
Sophos's information on Mal/DrodTar-A
plus some Internet searches
suggest that this
.exe had attributes of a relatively generic Windows
We also logged the message headers, and they make it clear that this
wasn't a case of someone wrapping up a malware sample in a tar file in
order to mail it to one of our people for research; it was an honest to
goodness piece of Windows malware trying to propagate itself in a tar
Tell-tale headers include:
From: "Account Manager"<firstname.lastname@example.org>
Subject: Purchase Order
(Our users do get emailed .tar and .tar.gz files, but they have actual contents and they're hopefully not showing up in email that looks like that.)
It turns out that this is not the first .exe-in-.tar attachment we've
seen in the past several months; back in May and June we saw a few that
were identified (at the time) as Mal/FareitVB-M.
More recently we saw a couple that sadly weren't identified as malware,
so they sailed right through the mail system (I suspect that they were
malware; a single
.exe file in a
.tar is unusual, and most of our
.tar attachments are actually
On further inspection we've also seen a number of other plain
attachments that seem to be malware, based on what they contain.
In addition to
.exes, we've logged single
some of which have also been identified as Mal/FareitVB-M. Probably
this means we should extend our rejection of bad things in ZIP archives
to also cover bad things in tar archives.
(All this goes to show that things can be hiding under innocent looking rocks.)
I'm a little bit surprised that Windows malware distributes itself as tar archives, because I would have thought that not many Windows machines can actually extract them without having to go find additional software. However, I may be wrong about this; some searches suggest that common Windows archive handling programs (such as 7-zip) are sufficiently polymorphic that they'll also unpack tar archives for you. Perhaps the malware authors have discovered that malware packed up in tar archives gets through defenses slightly more readily than malware in ZIP archives.
(Sadly, this is certainly the case here, where we'd have immediately rejected these attachments if they'd been ZIP archives instead of tarballs.)
I guess I'm a little bit sad and disappointed that tar archives are now being exploited by malware, in a 'is nothing sacred?' kind of way.
(Where malware and spam in general is concerned, the answer has always been 'of course not'. But I still like to think of Unix things as existing in a separate world, one not contaminated by the grubby realities of the modern malware-in-email environment.)
I've now seen something doing SMTP probing of IPv6 addresses
One of the machines that I run my sinkhole SMTP server on has an IPv6 address. This address is present in DNS, but wasn't directly visible as the target of an MX record or anything else that would lead it to clearly being associated with email. To my surprise, yesterday a machine connected to my sinkhole SMTP server on this relatively obscure IPv6 address.
(This machine is the MX target of an old hostname that spammers and other people have latched on to, but the MX target didn't have an IPv6 address, just an IPv4 one.)
The source IPv6 is 2607:ff10:c5:509a::1 in cari.net, and an Internet
search found an interesting report about it,
which seems vaguely sloppy given how easy it usually is to change
IPv6 addresses. The actual activity I saw appears to have been TLS
probes; on its first two connections, it
STARTTLS'd with different
ciphers and then abandoned the connection after TLS had started.
EHLOs were used too, first 'k7wyLkmlLdInG.com' and then
(The first connection used ECDHE-RSA-AES256-GCM-SHA384, a TLS v1.2 cipher; the second used the much older ECDHE-RSA-AES256-SHA, originally from SSLv3.)
Looking at my logs, I've seen similar TLS probes with similar
(especially 'openssl.client.net') from a cari.net IPv4 address,
220.127.116.11. This has a PTR record of 'burger.census.shodan.io',
although the IP address listed for that name doesn't match. If this
is a Shodan source point, SMTP TLS scanning isn't particularly
surprising in general (although it didn't work very well against
my sinkhole SMTP server). It does surprise me that people are clearly
trying IPv6 addresses for this, presumably by crawling DNS to find IPv6
addresses and then probing all ports on them just to see.
(Checking my logs, I see that my SSH daemon refused to talk to 2607:ff10:c5:509a::1 at around the same time, so this is probably port scanning and probing in general and may well be Shodan. Shodan once exploited NTP to find active IPv6 addresses, and may be back to this sort of tricks.)
Going back further in my SMTP logs, I see that 18.104.22.168 aka 'census3.shodan.io' also did this sort of probing. So perhaps Shodan has turned its unblinking eye on my corner of the network world in general, and the IPv6 probes are just a manifestation of this. Sadly that makes them less interesting and means that I've yet to actually encounter a spammer trying to use IPv6. Maybe someday.
Spam issues need to be considered from the start
A number of Google's issues from the spammer I talked about yesterday come down to issues of product design, where Google's design decisions opened them up to being used by a spammer. I considered these issues mistakes, because they fundamentally enable spammers, but I suspect that Google would say that they are not, and any spam problems they cause should get cleaned up by Google's overall anti-spam and systems that watch for warning signs and take action. Well, we've already seen how that one works out, but there's a larger problem here; this is simply the wrong approach.
In a strong way, anti-spam considerations in product design are like (computer) security. We know that you don't get genuinely secure products by just building away as normal and then bringing in a security team to spray some magic security stuff over the product when it's almost done; this spray-coated security approach has been tried repeatedly and it fails pretty much every time. The way you get genuinely secure products is considering security from the very start of the product, when it is still being designed, and then continuing to pay attention to security (among other things) all through building the product, at every step along the way. See, for example, Microsoft's Security Development Lifecycle, which is typical of the modern approach to building secure software.
(That you need to take a holistic approach to security is not really surprising; you also need to take a holistic approach to things like performance. If no one cares about performance until the very end, you can easily wind up digging yourself into a deep performance hole that is very painful and time-consuming to get out of, if it's even feasible to do so.)
Similarly, you don't get products that can stand up to spammers by
designing and building your products without thinking about spam,
and then coming along at the end to spray-coat some scanning and
monitoring magic on top and add an
abuse@... address (or web
form). If you want products that will not attract spammers like
ants to honey, you need to be worrying about how your products could
be abused right from the start of their design. By now the basics
of this are not particularly difficult, because we have lots of
painful experience with spammers (eg).
Google is objectively running a spammer mailing list service
If you are a mailing list service provider, there are a number of things that you need to do, things that fall under not so much best practices as self defense. My little list is:
- You shouldn't allow random people you don't know and haven't carefully authenticated to set up mailing lists that you'll send out for them.
- If you do let such people set up mailing lists, you should require that all email addresses added to them be explicitly confirmed with the usual two step 'do you really want to subscribe to this' process.
- If you actually allow random people you don't know to add random email addresses to their mailing lists, you absolutely must keep a very close eye on the volume of rejections to such mailing lists. A significant rate of rejections is an extremely dangerous warning sign.
Google, of course, does none of these, perhaps because doing any of these would require more people or reduce 'user engagement', also known as the number of theoretical eyeballs that ads can be shown to. The result is predictable:
2017-10-04 08:19 H=mail-io0-f199.google.com [22.214.171.124] [...] F=<emails1+[...]@offpay.party> rejected [...] 2017-10-04 08:26 H=mail-ua0-f200.google.com [126.96.36.199] [...] F=<emails5+[...]@offpay.party> rejected [...] 2017-10-04 08:31 H=mail-vk0-f71.google.com [188.8.131.52] [...] F=<emails7+[...]@offpay.party> rejected [...] 2017-10-04 08:31 H=mail-pf0-f198.google.com [184.108.40.206] [...] F=<emails7+[...]@offpay.party> rejected [...] 2017-10-04 08:32 H=mail-qk0-f198.google.com [220.127.116.11] [...] F=<emails8+[...]@offpay.party> rejected [...] 2017-10-04 08:39 H=mail-qk0-f199.google.com [18.104.22.168] [...] F=<emails9+[...]@offpay.party> rejected [...] 2017-10-04 08:40 H=mail-it0-f70.google.com [22.214.171.124] [...] F=<emails9+[...]@offpay.party> rejected [...] 2017-10-04 08:40 H=mail-io0-f200.google.com [126.96.36.199] [...] F=<emails11+[...]@offpay.party> rejected [...] 2017-10-04 08:40 H=mail-io0-f197.google.com [188.8.131.52] [...] F=<emails11+[...]@offpay.party> rejected [...] 2017-10-04 08:41 H=mail-ua0-f197.google.com [184.108.40.206] [...] F=<emails11+[...]@offpay.party> rejected [...] 2017-10-04 11:57 H=mail-vk0-f69.google.com [220.127.116.11] [...] F=<emails15+[...]@offpay.party> rejected [...] 2017-10-04 12:06 H=mail-pg0-f71.google.com [18.104.22.168] [...] F=<emails16+[...]@offpay.party> rejected [...] 2017-10-04 12:09 H=mail-qt0-f200.google.com [22.214.171.124] [...] F=<emails18+[...]@offpay.party> rejected [...]
That's just from today; we have more from yesterday, October 2nd,
and October 1st. They're a mixture of
RCPT TO rejections (generally
due to 'no such user') and post-
DATA rejections from our
commercial anti-spam system laughing very
loudly at the idea of accepting the email. Many other copies made
it through, not because they weren't seen as spam but because they
were sent to users who hadn't opted into our SMTP time spam rejection.
Google has deliberately chosen to mix all of its outgoing email into one big collection of mail servers that third parties like us can't easily tell apart. For Google, this has the useful effect of forcing recipients to choke down much of Google's spam because of GMail, instead of letting people block it selectively. In this case, we have some trapped mail headers that suggest that this is something to do with Google Groups, which is of course something that we've seen before, with bonus failures. That was about six years ago and apparently Google still doesn't care.
(Individual people at Google may care, and they may be very passionate about caring, but clearly the corporate entity that is Google doesn't care. If it did care, this would not happen. At a minimum, there would be absolutely no way to add email addresses to any form of mailing list without positive confirmation from said addresses. Instead, well, it's been six years and this stuff is still happening.)
PS: My unhappy reactions yesterday on Twitter may have produced some results, which is better than nothing, but it should go without saying that that's not exactly a good solution to the general issue. Spammers are like ants; getting rid of one is simply dealing with the symptoms, not the problems.
If spam false positives are inevitable, should we handle them better?
Right now, our mail system basically punts on handling false positives (non-spam detected as spam). The only thing that users can do about false positives is turn off SMTP time rejection (if they have it turned on) and then fish the mis-classified message out of their filters or our archive of all recent email they've gotten. If the message has already been rejected, the only thing that can be done is to get the sender to re-send it. And there's no way for users to see what messages have been rejected, so they can tell if some important email message has fallen victim to a false positive; instead we periodically get requests to check our logs.
My impression is that our mail system's behavior is not atypically bad, and instead that plenty of other mail systems behave in much the same way. It's pretty straightforward to see why, too; it would take significantly more work to engineer anything more than this, especially if you reject at SMTP time (and I think you want to, because that way at least people find out that their email hasn't gone through because of a false positive). But probably we should do better here, if only because this is a pain point for our users (it's one of the things that gets them to talk to us about our spam filtering).
(This is also probably required if we accept the idea that 'spam levels' may be a copout.)
In general, a mail system do things with potential false positives
from two sides. It can give the local receiving user some tools to
investigate situations, answering the question of 'did X try to
send me email and what happened to it', and perhaps also to retrieve
such mis-classified email. Retrieving email rejected at SMTP time
requires your mailer to save a copy of such email (at least for a
while), which means you need to defer rejections to
This opens up a complicated tangle of worms for messages sent to
multiple recipients (although they go away again if you mandate
that you only have one 'spam level' and
everyone gets it).
Your mailer can also give senders some tools they can use to cause false positive messages to get accepted anyway. You probably don't want to offer these tools to all senders; sure, most spammers aren't paying attention to you, but some spam (such as advance fee fraud attempts) does come from real human beings doing things to compromised mail systems by hand, and they might well take advantage of your generosity. However, if someone's regular correspondent has some email classified as spam, it's probably safe (and worthwhile) to offer them these tools. The odds are probably good that it's an accident as opposed to a compromised account with a human being at the other end to take advantage of you.
(There are a wide variety of options for how to let people psuh
messages through. You could do 'mail it to this special one-time
address', or 'include this magic header or
Subject marker', or
just 'visit this URL of ours to request that your email get delivered
anyway'. And I'm sure there's more. I have no idea which option
would work best, and SMTP-time rejection makes things complicated
because it's hard to give people much information.)
None of these are particularly easy to put together with off the shelf components, though, which for many places is probably going to make the entire issue moot. And maybe things should be left as they are for the straightforward reason that a low level of false positives just doesn't justify much sysadmin effort to improve the situation, especially if it requires a complicated bespoke custom solution.
(This is one of the entries where it turns out that I don't have any firm conclusions, just some rambling thoughts I want to write down.)
The idea of 'spam levels' may be a copout
I recently wrote about using the spam scores from another mail system at the university. In a comment, Robert Sander suggested that the original email system should have just rejected the spam at SMTP time. There are a number of issues here, but one of the traditional reasons not to do this is to provide your users with varying levels of spam filtering (which is something we do). This is a perfectly traditional reason, but perhaps this is a copout answer (to be fair, an unexamined one).
The fundamental problem with the idea of spam levels is the usual problem with asking people questions, namely that most people aren't going to be able to make useful decisions because they don't have enough knowledge. With spam scoring levels this is even more of an issue than usual, because many of the answers are either unknowable or take statistical analysis to answer properly. For example, the only reason that I know something about the distribution of the spam scores in our incoming email is because I've gone well out of my way to do analysis on our logs (and I have access to those logs). If I were to ask a user to choose between rejecting on a '70%' spam rating and an '85%' spam rating, how on earth are they supposed to make a sensible choice? At a minimum they'd need to figure out the distribution of spam scores for their email, both legitimate and spam, to see if this is a useful or sensible choice.
In practice there's only one thing that users are going to do with a spam levels knob. They're going to make it more aggressive when they get annoyed with spam and then if they find out that they had false positives (rejecting real email that they want), they might reluctantly make it less aggressive again. Even this represents a failure on our part, since an ideal mail system shouldn't require this tuning in the first place for almost all users.
(The exception are users who get messages that look a lot like spam but aren't. These users will probably always need some way to let through more questionable things.)
So I think there's a serious argument that features like 'spam levels' are essentially a copout. They're our attempt to wash our hands of taking responsibility for the unsolvable problem of rejecting all spam without rejecting anything else. Sure, we can't solve the problem, but we owe it to our users to give it our best shot and then own up to the resulting imperfections as the best tradeoffs we can achieve. Making and justifying this sort of tradeoff is part of what system administration is about in general, after all.
(If we do this, we might also want to think seriously about how we can deal with false positives in an environment where the answer is not 'well, turn your spam filtering off'. Sadly, this probably involves more sophisticated filtering and scoring than is provided by the commercial anti-spam package we use.)
(Some years ago I wrote that filtering spam for users was part of our job, which sort of touches on the same ideas.)
People probably aren't going to tell you when your anti-spam systems are working
As part of yesterday's entry about how we're now using the spam scores generated by the university's central email system, I was going to say that our users seem happy with the results. However, I have to admit that this is not quite true. We don't explicitly know that they're happy; instead, what we know is that they've stopped reporting that they're getting too much spam and can we do something about that.
What I've come to expect is pretty straightforward, namely that users generally aren't going to give us feedback when our anti-spam systems are working well. And why should they? Really, a spam-free email system where all the email they want gets delivered with no false positives is just how things should be, and you don't generally tell people 'good job, the systems are working just how they're supposed to work'. Naturally, people are generally only going to tell you when something goes wrong, either what they think of as an excess of spam or when email they're expecting doesn't get through or gets bounced.
On the one hand, this can be a bit frustrating when (or if) we want to know if some theoretically clever trick we've added to the mail system is making people's email more pleasant. On the other hand, this means that no news is good news; if we're not getting complaints about spam or missing email, we're most likely (still) doing things right. If we change something and nobody says anything, at a minimum our change did no harm.
As a side note, it's probably not reliable to count on users to start complaining if (or when) the amount of spam they see goes up. By now, some number of users have been trained to expect a certain amount of spam in their inboxes, and they won't start complaining out loud until the spam getting through really gets excessive.
(They may click on 'this is spam' or 'mark as spam' buttons if those buttons are conveniently available to them in their mail environment, especially if the buttons appear to do something. If you have such buttons, monitoring how often users click them can likely give you an early warning indicator of increased spam getting through your filters. Or at least email that your users don't want.)
We've wound up using the spam scores from some other mail systems
Like many places, the university has a central email system and all staff and professors have an address there. One of the things you can do with your UTORMail email is forward all of it to an email account elsewhere, such as your account here. A decent number of our users both do this and make a reasonably significant use of their central email address, making it visible and active and thus periodically hit by spammers. When these central email addresses get hit with spam, the central email system forwards it on to us.
Starting a couple of months ago, the university's central email seems to have been targeted by a number of active spam campaigns. People who read their email in the central system were commenting on it and certainly some of our users were reporting it to us, because there was a problem; the commercial anti-spam package we use wasn't recognizing this spam as spam. In theory we could give our normal answer of 'it's a black box, we get what we get', but in practice this felt unsatisfying because the central email system was recognizing and tagging this email as spam before it passed it on to us.
Since the central email system actually uses the same commercial
anti-spam package that we do, my assumption was that our lower spam
score was because we weren't directly receiving the spam. Since it
had gone through the central email system, there was a layer or two
Received: headers and other obscuring things (and the source
IPs were different for DNS blocklist checks and so on). It was
fairly obvious spam so we were scoring it relatively high, but after
the layer of forwarding it wasn't quite sufficiently obvious to get
scored as definitely spam.
This was kind of frustrating. The central email system was putting its
spam scoring information right there in the message headers of the
forwarded messages; we just weren't paying any attention to it. Well,
we could change that, so we did (which took a little bit of work in
Exim). Now, if we don't score something as spam but the central email
system does, we still mark it as spam in the
Subject: header, which
triggers various downstream processing. In practice, users who are doing
spam filtering at all will have this forwarded email go away just the
way regular spam does.
(We considered doing something more sophisticated and selective in
Exim, but decided that there were simply too many places in our
overall mail setup that knew about the
Subject: tag, including
filtering done by users through
procmail and so on. Also, we
couldn't think of anything we'd want to do differently depending
on who had determined it was spam.)
Since we don't allow a 'not-spam' score from the central email system to override our own opinion, we don't make any attempt to limit this special handling to email that we definitely received from the central email system. This has the useful side effect that if you're forwarding your email through another system before it gets to us, we'll still pay attention to the central email system's spam scoring for you.