2014-11-23
I'm happier ignoring the world of spam and anti-spam
As I've mentioned a couple of times, I'm currently running a sinkhole SMTP server to collect spam samples. Doing this has let me learn or relearn a valuable lesson about anti-spam work.
My sinkhole SMTP server has several sorts of logging and monitoring,
including a log of SMTP commands, and of course I can run it or turn
it off as I feel like. When I first set it up, I configured it to be
auto-started on system reboot and I watched the SMTP command log a lot
of the time with 'tail -f'. The moment a new spam sample showed up I'd
go read it.
The problem with this wasn't the time it took. Instead the problem is simpler; actively monitoring my sinkhole SMTP server all the time made me think about spam a lot, and it turns out that having spam on my mind wasn't really a great experience. In theory, well, I told myself that watching all of the spam attempts was somewhere between interesting (to see their behavior) and amusing (when they failed in various ways). In practice it was quietly wearying. Not in any obvious way that I really noticed much; instead it was a quiet drag that got me a little bit down.
Fortunately I did notice it a bit, so at a couple of points I decided to just turn things off (once this was prompted by a persistent, unblockable run of uninteresting spam that was getting on my nerves). What I found is that I was happier when I wasn't keeping an eye on the sinkhole SMTP server all the time, or even checking in on it very much. Pretty much the less I looked at the sinkhole server, the better or at least more relaxed I felt.
So what I (re)learned from all of this is that not thinking very much about the cat and mouse game played between spammers and everyone else makes me happier. If I can ignore the existence of spammers entirely, that's surprisingly fine.
As a result of this my current approach with my sinkhole SMTP server is to ignore it as much as possible. Currently I'm mostly managing to only check new samples once every few days and not to do too much with them.
(I probably wouldn't have really learned this without my sinkhole SMTP server because it has the important property that I can vary the attention I pay to it without any bad consequences for my real mail. Even running it at all is completely optional, so sometimes I don't.)
2014-10-20
Some numbers on our inbound and outbound TLS usage in SMTP
As a result of POODLE,
it's suddenly rather interesting to find out the volume of SSLv3
usage that you're seeing. Fortunately for us, Exim directly logs
the SSL/TLS protocol version in a relatively easy to search for
format; it's recorded as the 'X=...' parameter for both inbound
and outbound email. So here's some statistics, first from our external
MX gateway for inbound messages and then from our other servers for
external deliveries.
Over the past 90 days, we've received roughly 1.17 million external email messages. 389,000 of them were received with some version of SSL/TLS. Unfortunately our external mail gateway currently only supports up to TLS 1.0, so the only split I can report is that only 130 of these messages were received using SSLv3 instead of TLS 1.0. 130 messages is low enough for me to examine the sources by hand; the only particularly interesting and eyebrow-raising ones were a couple of servers at a US university and a .nl ISP.
(I'm a little bit surprised that our Exim doesn't support higher TLS versions, to be honest. We're using Exim on Ubuntu 12.04, which I would have thought would support something more than just TLS 1.0.)
On our user mail submission machine, we've delivered to 167,000 remote addresses over the past 90 days. Almost all of them, 158,000, were done with SSL/TLS. Only three of them used SSLv3 and they were all to the same destination; everything else was TLS 1.0.
(It turns out that very few of our user submitted messages were received with TLS, only 0.9%. This rather surprises me but maybe many IMAP programs default to not using TLS even if the submission server offers it. All of these small number of submissions used TLS 1.0, as I'd hope.)
Given that our Exim version only supports TLS 1.0, these numbers are more boring than I was hoping they'd be when I started writing this entry. That's how it goes sometimes; the research process can be disappointing as well as educating.
(I did verify that our SMTP servers really only do support up to TLS 1.0 and it's not just that no one asked for a higher version than that.)
One set of numbers I'd like to get for our inbound email is how TLS usage correlates with spam score. Unfortunately our inbound mail setup makes it basically impossible to correlate the bits together, as spam scoring is done well after TLS information is readily available.
Sidebar: these numbers don't quite mean what you might think
I've talked about inbound message deliveries and outbound destination
addresses here because that's what Exim logs information about, but
of course what is really encrypted is connections. One (encrypted)
connection may deliver multiple inbound messages and certainly may
be handed multiple RCPT TO addresses in the same conversation.
I've also made no attempt to aggregate this by source or destination,
so very popular sources or destinations (like, say, Gmail) will
influence these numbers quite a lot.
All of this means that this sort of numbers can't be taken as an indication of how many sources or destinations do TLS with us. All I can talk about is message flows.
(I can't even talk about how many outgoing messages are completely protected by TLS, because to do that I'd have to work out how many messages had no non-TLS deliveries. This is probably possible with Exim logs, but it's more work than I'm interested in doing right now. Clearly what I need is some sort of easy to use Exim log aggregator that will group all log messages for a given email message together and then let me do relatively sophisticated queries on the result.)
2014-10-12
Phish spammers are apparently exploiting mailing list software
One of the interesting things I've observed recently through my sinkhole SMTP server is a small number of phish spams that have been sent to me by what is clearly mailing list software; the latest instance was sent by a Mailman installation, for example. Although I initially thought all three of the emails I've spotted were all from one root cause, it turns out that there are several different things apparently going on.
In one case, the phish spammer clearly seems to have compromised a
legitimate machine with mailing list software and then used that
software to make themselves a phish spamming mailing list. It's
easy to see the attraction of this; it makes the phish spammer much
more efficient in that it takes them less time to send stuff to
more people. In an interesting twist, the Received headers of the
email I got say that the spammer initially sent it with the envelope
address of service@paypal.com.au (which matched their From:)
and then the mailing list software rewrote the envelope sender.
In the most clear-cut case, the phish spammer seems to have sent out their spam through a commercial site that advertises itself as (hosted) 'Bulk Email Marketing Software'. This suggests that the phish spammer was willing to spend some money on their spamming, or at least burned a stolen credit card (the website advertises fast signups, which mean that credit cards mean basically nothing). I'm actually surprised that this doesn't happen more often, given that my impression is that the spam world is increasingly commercialized and phish spammers now often buy access to compromised machines instead of compromising the machines themselves. If you're going to spend money one way or another and you can safely just buy use of a commercial spam operation, well, why not?
(I say 'seems to' because the domain I got it from is not quite the same as the commercial site's main domain, although there are various indications tying it to them. If the phish spammer is trying to frame this commercial site, they went to an unusually large amount of work to do so.)
The third case is the most interesting to me. It uses a domain that was registered two days before it sent the phish spam and that domain was registered by an organization called 'InstantBulkSMTP'. The sending IP, 173.224.115.48, was also apparently also assigned on the same day. The domain has now disappeared but the sending IP now has DNS that claims it is 'mta1.strakbody.com' and the website for that domain is the control panel for something called 'Interspire Email Marketer'. So my operating theory is that it's somewhat like the second case; a phish spammer found a company that sets up this sort of stuff and paid them some money (or gave them a bad credit card) for a customized service. The domain name they used was probably picked to be useful for the phish spam target.
(The domain was 'titolaricartasi.info' and the phish target was cartasi.it. Google Translate claims that 'titolari' translates to 'holders'.)
PS: All of this shows the hazards of looking closely at spam. Until I started writing this entry, I had thought that all three cases were the same and were like the first one, ie phish spammers exploiting compromised machines with mailing list managers. Then things turned out to be more complicated and my nice simple short blog entry disappeared in a puff of smoke.
2014-09-28
Learning a lesson about spam-related logging (again)
I recently mentioned some stats about how many clients do TLS with my sinkhole SMTP server. Today I was planning to present some broader stats from that server, showing how many clients made it to various points in the SMTP conversation with it. Then, unfortunately, I discovered that I'd shot myself in the foot as far as gathering this sort of stats was concerned.
(If I was running a pure 'accept everything' SMTP server these should be pretty boring stats. But as it happens I'm not; instead I'm going various things to get rid of uninteresting or overly noisy connections and uninteresting spam.)
My sinkhole SMTP server currently has two logs: a SMTP command log
(with additional notations for events like connections, TLS
negotiations, and so on) and a message log (which logs one line per
message that made it all the way through to fully receiving the
DATA payload). What I would like to do is generate stats on things
like how many connections there were, how many of them made it as
far as an EHLO, how many made it to MAIL FROM, and so on.
When I started I thought that I could just grep my SMTP command
log and count how many hits I got on various things.
Well, no, not so fast. Take EHLO, for example; a proper client
that successfully negotiates TLS will issue two EHLO commands.
A related issue happens with, say, RCPT TO commands; because many
clients pipeline their input, they may well send a RCPT TO even
though their MAIL FROM failed. Bogus clients may equally send a
MAIL FROM even if their EHLO failed (or may try MAIL FROM
without even EHLOing first).
Given pure SMTP logs there are two ways to fix this. The first would
be to have a unique and distinct 'success' reply (or message) for
every different SMTP command; then I could search for the successful
replies instead of the commands being issued. The other would be
to use a process which reconstructs the state of each SMTP conversation
so it can tell successful commands from failed ones, suppress
post-TLS EHLOs, and so on. You could even have the latter option
spit out a data record for each conversation with all of this
per-conversation information.
Unfortunately I do not have the former (successful SMTP reply
messages are both duplicated and varied) and creating something to
do the latter is too much work for now.
What this really points out to me is that I should have thought more about what sort of information I might want when designing my server's logging. In theory I knew from prior experience with Exim that raw logs could make it too complicated to generate interesting stats, but in practice it never occurred to me that I might be doing this to myself.
(Applications to what gets logged and so on by our real mail system are left as an application for me to think about, but this time around I do want to think about it and see if I can improve Exim;s logging or otherwise do anything interesting.)
2014-09-15
My collection of spam and the spread of SMTP TLS
One of the things that my sinkhole SMTP server does that's new on my workstation is that it supports TLS, unlike my old real mail server there (which dates from a very, very long time ago). This has given me the chance to see how much of my incoming spam is delivered with TLS, which in turn has sparked some thoughts about the spread of SMTP TLS.
The starting point is that a surprising amount of my incoming spam is actually delivered with TLS; right now about 30% of the successful deliveries have used TLS. This is somewhat more striking than it sounds for two reasons; first, the Go TLS code I'm relying on for TLS is incomplete (and thus not all TLS-capable sending MTAs can actually do TLS with it), and second a certain amount of the TLS connection attempts fail because the sending MTA is offering an invalid client certificate.
(I also see a fair number of rejected delivery attempts in my SMTP command log that did negotiate TLS, but the stats there are somewhat tangled and I'm not going to try to summarize them.)
While there are some persistent spammers, most of the incoming email is your typical advance fee fraud and phish spam that's send through various sorts of compromised places. Much of the TLS email I get is this boring sort of spam, somewhat to my surprise. My prejudice is that a fair amount of this spam comes from old and neglected machines, which are exactly the machines that I would expect are least likely to do TLS.
(Some amount of such spam comes from compromised accounts at places like universities, which can and do happen to even modern and well run MTAs. I'm not surprised when they use TLS.)
What this says to me is that support for initiating TLS is fairly widespread in MTAs, even relatively old MTAs, and fairly well used. This is good news (it's now clear that pervasive encryption of traffic on the Internet is a good thing, even casual opportunistic encryption). I suspect that it's happened because common MTAs have enabled client TLS by default and the reason they've been able to do that is that it basically takes no configuration and almost always works.
(It's clear that at least some client MTAs take note when STARTTLS
fails and don't try it again even if the server MTA offers it to
them, because I see exactly this pattern in my SMTP logs from some
clients.)
PS: you might wonder if persistent spammers use TLS when delivering their spam. I haven't done a systematic measurement for various reasons but on anecdotal spot checks it appears that my collection of them basically doesn't use TLS. This is probably unsurprising since TLS does take some extra work and CPU. I suspect that spammers may start switching if TLS becomes something that spam filtering systems use as a trust signal, just as some of them have started advertising DKIM signatures.
2014-08-24
My spam is (mostly) boring
I've mentioned a couple of times that I'm doing an experiment with a sinkhole SMTP server to handle email for some old addresses of mine that have become nothing but spam. When I started the experiment, what I think I expected to find was a bunch of industrial spam operations, places that had my addresses firmly anchored in spam lists and were sending their 'legitimate' email to them on a persistent basis, and maybe some interesting spammer behavior otherwise.
While there has been some of this and there are a few persistent and sometimes very aggressive mailing list places trying to send me spam, almost all of what I get now is surprisingly boring. Specifically, most of what I get is now advance fee fraud with a bit of phish spam mixed in.
(Admittedly I blocked the aggressive sending places once I identified them as persistent repeat senders. When I already have enough samples of their spam, I don't particularly need more.)
This 'boring' spam comes from all over and has at best vague patterns to it. It's clear that there's a lot of people doing it, a lot of hosts being abused as senders, a great variety of origin addresses being forged onto the email, and the contents vary a lot at a mechanical level. But at the level of learning interesting things about spammer behavior there's no real variation, which is why I call it boring. Advance fee fraud spam is advance fee fraud spam; I don't think I've spotted anyone doing anything particularly ingenious, but then I haven't been paying much attention.
All of this kind of makes my sinkhole SMTP server a failed experiment. If I'm not going to get interesting spam there's very little point in running it at all, so I'm probably going to shut it down entirely soon and let all the spammers just have their email time out.
(I sometimes toy with running it with absolutely no restrictions for a limited time, say a week, and seeing what I collect in that week and how things break down and so on. But I'm not sure I have the energy for that particular experiment.)
2014-08-05
Why LinkedIn's 'you must join to unsubscribe' is evil
Recently I got a '<X> would like to add you to their professional network' email message from LinkedIn (from what I'm certain is a spammer). I'm not a LinkedIn user, so in an excess of optimism I went to the 'unsubscribe' link in the email. And, well, let me quote my own Tweet summarizing things:
@thatcks: I see. To get LinkedIn to stop emailing me connection invitations, I have to actually join LinkedIn. That makes those emails clear spam.
Perhaps you think that this behavior on LinkedIn's part is relatively harmless and no big deal. After all, all I have to do is join, right?
There are two things that make this wrong and one thing that makes this actively evil. Let's cover the two things first. To start with, this is not actually an unsubscribe link. 'Unsubscribe' links that don't actually function are known by many names, including 'bait and switch'. They are never a friendly act; they demonstrate that the sender intends to throw obstacles in your way because they very much object to you unsubscribing and want to make it hard.
Beyond that, well, 'fool me once, shame on you; fool me twice, shame on me'. Why should I believe or trust that LinkedIn will let me actually (permanently) unsubscribe if I sign up? They've already lied once; I'm sure they can find a way to lie again, either now or in the future when it's convenient to them. As above, they've already demonstrated that they are not actually interested in letting people unsubscribe.
But all of that pales next to the actively evil bit: to sign up for LinkedIn, I must agree to their Terms of Service. It is absolutely guaranteed that LinkedIn's ToS contains objectionable things that no one in their right mind would agree to if they had a choice, because essentially all terms of service for large websites contain such terms. And it's all but certain that agreeing to their ToS is a binding legal agreement. Evil things in Terms of Service are usually excused with the rubric 'well, if you don't like them don't use the service, it's being offered for free'. Here I have no interest in using the service, I just want to unsubscribe. Effectively LinkedIn is giving me no choice; it is agree or suffer their continued spam.
Fundamentally what has happened here is that LinkedIn has turned unsubscribing from a right into a privilege, extended on LinkedIn's terms and at their whims. I do not have the 'right' to unsubscribe from LinkedIn's email, or they would have just done so with no fuss or muss. Instead I have only the privilege to ask to (maybe) be unsubscribed, under whatever terms LinkedIn feels free to dictate.
This is no genuine unsubscribe option. This is a sham, and I hope that recent Canadian legislation winds up seeing LinkedIn called on this.
(Yes, yes, as evil goes it is very small evil on the global scale of things.)
2014-07-20
The CBL has a real false positive problem
As I write this, a number of IP addresses in 128.100.1.0/24 are listed in the CBL, and various of them have been listed for some time. There is a problem with this: these CBL-listed IP addresses don't exist. I don't mean 'they aren't supposed to exist'; I mean 'they could only theoretically exist on a secure subnet in our machine room and even if they did exist our firewall wouldn't allow them to pass traffic'. So these IP addresses don't exist in a very strong sense. Yet the CBL lists them and has for some time.
The first false positive problem the CBL has is that they are listing
this traffic at all. We have corresponded with the CBL about this
and these listings (along with listings on other of our subnets)
all come from traffic observed at a single one of their monitoring
points. Unlike what I assumed in the past,
these observations are not coming from parsing Received: headers
but from real TCP traffic. However they are not connections from
our network, and the university is the legitimate owner and router
of 128.100/16. A CBL observation point that is using false routing
(and is clearly using false routing over a significant period of time)
is an actively dangerous thing; as we can see here, false routing can
cause the CBL to list anything.
The second false positive problem the CBL has is that, as mentioned, we have corresponded with the CBL over this. In that correspondence the CBL spokesperson agreed that the CBL was incorrect in this listing and would get it fixed. That was a couple of months ago, yet a revolving cast of 128.100.1.0/24 IP addresses still gets listed and relisted in the CBL. As a corollary of this, we can be confident that the CBL listening point(s) involved are still using false routes for some of their traffic. You can apply charitable or less charitable assumptions for this lack of actual action on the CBL's part; at a minimum it is clear that some acknowledged false positive problems go unfixed for whatever reason.
I don't particularly have a better option than the CBL these days. But I no longer trust it anywhere near as much as I used to and I don't particularly like its conduct here.
(And I feel like saying something about it so that other people can know and make their own decisions. And yes, the situation irritates me.)
(As mentioned, we've seen similar issues in the past, cf my original 2012 entry on the issue. This time around we've seen it on significantly more IP addresses, we have extremely strong confidence that it is a false positive problem, and most of all we've corresponded with the CBL people about it.)
2014-07-15
A data point on how rapidly spammers pick up addresses from the web
On June 15, what is almost exactly a month ago now, I wrote an entry on a weird non-relaying relay attempt I saw. In the entry I quoted a SMTP conversation, including a local address handled by my sinkhole SMTP server. As I was writing the entry I decided to change the local part of the address to an obviously bogus 'XXXX' and then see if spammers picked up that address and started trying to deliver things to that new address.
I am now able to report that it took less than a month. On July
11th I saw the first delivery attempt; July 14th saw the second and
third ones. The first and the third 'succeeded' in getting all the
way to a DATA submission (which was 5xx'd but had the message
captured for my inspection). The resulting spam is a little bit
interesting.
The first spam message looks like a serious attempt by what seems like a Chinese-affiliated spam gang to sell me some e-mail address databases, based on what geographic area I wanted to target, and maybe hawk their spamming services too. It uses a forged envelope sender and comes from a US hosting/cloud provider, with replies directed to 163.com and a image in its HTML being fetched from a tagged URL on a Chinese IP address.
The second spam message (from the third delivery attempt) comes from what is probably a compromised mail server in the UK. It is plain and straightforward advance fee fraud, and not a particularly sophisticated one; apart from the destination address there is absolutely nothing unusual about it. It was probably ultimately sent from Malaysia, perhaps from a compromised machine of some sort (the likely source IP is currently in the CBL).
(The second delivery attempt had sufficiently many signs of being
ordinary advance fee fraud that my sinkhole SMTP server rejected
it before DATA. Now that I look it comes from an IP address in
the same /24 as the first delivery attempt; it got rejected early
because the envelope sender address claimed to be from qq.com. I've
switched my sinkhole SMTP server to early rejection of stuff that's
likely to be boring spam because I've already collected enough
samples of it. Maybe someday I'll change my mind and do a completely
raw 'one week in spam', but not right now.)
There is an obvious theory about what happened with my address here: scraped by a spammer who briefly attempted to market services to me and then started selling the address and/or their spamming services to other spammers. I can't know if this story is right, of course. I may learn more if more spam arrives for that address.
(And if no more spam arrives for the address I'll also learn something. At this point I do expect it to get more spam, though, since it's in the hands of advance fee fraud spammers.)
2014-06-21
Sometimes 'unsubscribing' does seem to reduce spam activity
There is one particular email sending place that has for some time been the most prolific source of sending attempts to some of my old addresses. I put them in IP-level blocks literally years ago (which means that nothing they've sent me for years has been delivered) and ever since then I could count on their relatively small netblock to be either the very top blocked traffic source or at least one of them. They have in the past been SBL-listed, although they may not be right now, and there are plenty of reports all over the net of bad behavior. If you had asked me to pick someone who would never, ever stop spamming me they would have been at the top of the list.
When I started my sinkhole SMTP server experiment, of course they were one of the people who sent me email basically immediately (within a few minutes of the service starting). So I tried out their unsubscribe link to see what would happen. To my surprise I have gotten no email from them in the 20 days or so since then, when normally I would expect to be awash in it.
This is not a definitive test by any means, for many reasons (including that it's early yet). But it's at least an interesting and surprising result for me. Given that these people have ignored what is probably a half a decade worth of bounces, they were about the last people I'd have expected an unsubscribe to work on.
(I think I'm going to avoid speculating about why they might totally ignore years of bounces and non-delivery reports yet respond apparently instantly to an unsubscribe attempt, mostly because it would all just be complete speculation and guesswork.)