2014-01-22
Microsoft has become a spam emitter
I'll start by quoting my tweet:
I admire how Microsoft IP address space with no reverse DNS has now become a source of spam emitters using forged HELOs. Thanks, uSoft!
Let me show you the specific log entry that sparked this:
remote from [23.96.34.64] EHLO mail.rackspace.com 550 Unknown command 'EHLO' HELO mail.rackspace.com 250 [...] Hello mail.rackspace.com MAIL FROM:<ps@mail.com> 554 unacceptable from address: <ps@mail.com>
This IP address space is registered to Microsoft and has no reverse DNS. It is certainly not Rackspace. As it turns out this was probably a Windows Azure customer since this appears to be a Windows Azure datacenter range (in their 'useast' region). To determine this I had to dig up a Microsoft document on Azure Datacenter IP ranges from Internet searches.
I'm blaming Microsoft directly here because Microsoft consciously passed up the chance to clarify what the IP address was and who it might belong to. That's what reverse DNS is for, as shown by eg Amazon AWS (which gives their AWS IPs clear reverse DNS). Microsoft opted to keep their Azure IPs anonymous, so Microsoft gets to take the blame.
(Certainly as a sysadmin investigating a problem I'm not going to bother looking further when there is no functioning reverse DNS. Nor can I really do anything more precisely calibrated than acting on the entire registered netblock (not unless I want to pull that Microsoft data on a regular basis and examine it for changes).)
PS: The cynic in me is muttering that Microsoft decided to not do reverse DNS so that people like me couldn't just block based on the domain name being in whatever magic domain. I'm not sure I believe that, but it's certainly a tempting idea. (Anti-spam work makes one a cynic.)
PPS: This is apparently causing problems for Azure customers. I'm a little bit surprised that any large ISPs have started to reject email if you don't have reverse DNS; the last time I looked at this I assumed it was far too risky and not likely to be adopted by anyone major any time soon.
2014-01-10
An interesting recent spam run against one of my machines
A couple of days ago, the SMTP logs on one of my machines lit up with a whole bunch of attempted inbound connections from all over the world. The first striking thing about these connection attempts is that they all seemed to be from people's home machines (what would once have been called 'dialups' but now uses cable modems, DSL, and various other technologies). Many of these machines were on the PBL and a couple that I just checked now are currently on the CBL.
The second striking thing is the interesting way that the spammer behind this snatched defeat from the jaws of potential victory. A few of these IP addresses actually got to talk to my SMTP daemon; when they did, they all reacted like this:
remote from [94.174.75.128] HELO tvbtzzg.virginm.net 554 Unresolvable HELO name: tvbtzzg.virginm.net remote from [172.10.0.198] HELO koridl.sbcglobal.net 554 Unresolvable HELO name: koridl.sbcglobal.net
That's right. The spammer's software carefully worked out what the
proper top level domain name was for the particular IP being used, put
it in the HELO, and then made up a random hostname to go with it.
Given my usual views that spammers are by and large not stupid and are
highly motived to do what works, I suspect that such HELO names must
help their spam get through at least some spam filters (or, to put it
another way, that other HELO names increase the risks of the spam
email being filtered out). That very small operations like mine can use
this to immediately reject their spam is presumably unimportant.
(I don't have any idea what would cause a spammer to think that my particular machine was worth turning a corner of a botnet on, instead of just using a compromised machine or two and then moving on. Perhaps it's a very big botnet. It seems to have moved on now.)
2013-12-28
The era of known top-level domains or valid TLD patterns is mostly over
Once upon a time you could create a list of valid top level domains and
thus rapidly validate if an email address presented to you was even
possibly correct. If you were slightly more bootleg you could do it
with patterns, since all TLDs were either two letter country codes or
three letter 'traditional' TLDs (and you could actually list those).
This era started to crumble a while after I did my bootleg hacks to
a mailer here, with the introduction of things like .info, but for
a while I patched up the cracks by accepting four-letter TLDs.
That era is over now. It's probably been over for a while but I didn't
really notice because new TLDs haven't generally been introduced or
at least used in email to this one ancient system that basically only
gets spam now. But I just got an email attempt with a MAIL FROM of a
.travel domain and the domain actually exists so, well, so much for
that. So I've taken that particular bit of bootleg cleverness out of
this old mailer's configuration.
(I don't regret rejecting this particular email message because, as mentioned, it's highly likely that it was spam. But I sort of want this mailer's rejections to not be laughably wrong.)
PS: I could probably arrange to get a list of all currently valid TLDs from somewhere if I really wanted to. But that's more work than this old mailer justifies.
(Why I care about detecting valid top-level domains instead of just checking whether the address is fully valid is a long, sad story involving some design decisions by this mailer that were valid when it was written a long time ago but aren't really any more. The short version is 'uncontrollable address canonicalization'.)
2013-12-22
The benefits of using expendable email addresses for most things
One of the big email things that I do as an anti-spam and anti-annoyance precaution is that I basically never give places my real email address. Whenever something demands an email address in order for me to sign up, I make up an expendable address for the purpose (I know, I'm lucky that I have convenient access to as many expendable addresses as I want through a local facility that's sort of for this). All of the expendable addresses wind up in my real inbox, but I can tell which address an incoming email message came in through.
The lesser reason for doing this is that you'll pretty much know who leaked one of your addresses to spammers or other sources of unwanted email. Pragmatically this isn't good for very much except perhaps stopping doing business with the place involved; these days, places that leak addresses generally don't care and aren't going to change their ways just because you can demonstrate it was them.
(Note that leaks don't necessarily have to be deliberate, and some things leak almost by the nature of their existence. For example, bug trackers often expose your email address and mailing lists intrinsically leak your address to other subscribers (if you send email to them) and may leak them through publicly visible archives.)
The big reason for doing this is that you can turn off expendable addresses in a way that you can't with your real email address. Tired of 'unsubscribe' options that don't really work? Unsubscribe yourself by deleting the expendable address involved. Done. The lesser version, suitable for places you want to keep interacting with, is to make up a new expendable address, change your address in their system to this new address, and delete the old address. Everyone with the old address and no access to the new one is now out of luck.
What all of this does for me is not necessarily less spam but instead peace of mind. I can give out email addresses without being so stressed about it and I feel much more in control of my email because by and large I get to decide if people can keep emailing me, not them. On the whole this helps to stem or at least slow down a general retreat from email.
As a side note, modern email clients make it relatively convenient to interact with people through expendable email addresses. They support multiple sending addresses, they're often smart about picking the one to use on replies, and at least sometimes you can configure rules about which one should be chosen when. You don't really need client support if you're just receiving email through the various addresses but it does come in handy if and when you want to interact with people through them just to be sure.
(Because I use some old-fashioned tools for my email, I don't quite get this convenience and there are some obstacles in the way of it in general.)
2013-11-28
A quick analysis of bounces here
Every so often I pose myself a question which turns out to not really pan out. Recently I wound up wondering what sort of patterns I'd see for the destination addresses or domains of bounce email generated by our central mail server. What I expected to see was a good showing by what I consider my usual suspects, the kind of places that cause me to write that recent entry. Instead I found that we seem to generate a far lower volume of this sort of bounce than I expected and there are no really big glaring patterns (except one).
As mentioned, there's nowhere near the volume of bounces being sent to outside addresses that I expected to find. If I'm generating my stats right, we had well under a thousand of these over the past 30 days. Our single largest source and thus target for bounces is a relatively active technical mailing list that is totally not removing a bad address here; it's probably responsible for around half of all such bounces. The second largest source is similar but may not be a legitimate and above board mailing list (the Internet search oracles are unclear).
After that, well, things start coming out of the woodwork. The third most active source looks pretty clearly like spam (certainly mail servers we forward to are rejecting its emails on that basis), but in total numbers it's small beans. Then we have email from Facebook and Itunes (likely due to people forwarding their email to destinations that don't exist any more) and then a mixture of likely fully legitimate sources and more questionable ones. Nothing stands out.
In short, if I was relying on this analysis to find people who sent our users spam and then had that bounce, I don't think I'd have found much. The people who attracted my irritation in the earlier entry probably would have been lost in the noise.
2013-11-23
You are not fooling us with broken bounce addresses
This is a close cousin on my previous blog entry on broken bounce
addresses, but today I'm feeling less
charitable. Right now we have sitting in our mail queues a bounce that's
trying to be delivered to the address mailreturn@smtp.ymlp44.net and
has been for the past ten hours. From past experience I know that this
message will never be delivered; it will sit there until it times out.
Since this is an actual bounce, the original message was not scored as spam (we automatically discard bounces of spam). But this is not fooling anyone about what business 'Your Mailing List Provider' is really in. When you claim to be a legitimate mailing list provider but do not accept bounces back, well, people notice (especially if the envelope sender address looks like something that exists to catch errors and so on). Certainly we do. In fact you are fooling us far less effectively than if you accepted the bounce, complaint, or whatever and silently swallowed it.
(We'd probably never notice that. To do so we'd have to do some sort of analysis of common bounce target addresses or bounce target domains, and that's just not something we'd ordinarily do. The way you catch the eye of most sysadmins is to sit around in something that we pay attention to, such as our mail queues.)
I'd say that I don't know why people do this, but actually I do. It's
pretty easy. If you're setting up a bunch of different sending machines
and giving them all their own domains and hostnames that they'll use
in envelope sender addresses, it's that much more work to have them
listen for incoming SMTP (even if they just discard everything). And you
certainly don't want to MX all of your sending domains to something
common because that could give people who want to block all of your
activity an automated way of recognizing you.
2013-10-27
Old and new addresses and spam
In response to an aside wondering how fast spam fell off for disused email addresses, Henry Spencer wrote me to mention that his older address (disused now for many years) gets a lot more spam than his current address. I've been thinking about this since then and I've realized that I implicitly divide disused addresses into at least two different categories. Let us call these the old active addresses and everything else.
Put simply, the old active addresses were actively and generally widely used on the Internet in what is roughly the pre-spam era. Henry Spencer's old address is definitely one example of this, since Henry spent years being active (and famous) on Usenet. Old active addresses were visible to spammers in the era where spammers began accumulating address lists and as a result they made it on to a huge number of such lists. These lists seem to still circulate and recombine today, even though an increasing amount of the addresses are no longer valid; effectively they have an exceptionally and I suspect atypically long half-life.
(One of my old addresses seems to be like this, in fact, although not the address that prompted my earlier entry.)
Other addresses either weren't visible enough to make it on to those early spammer address lists or postdate them in general. These addresses are not so universal in spammer usage and so get hit less and, I assume, also fall out of usage faster and to a larger degree. These are the addresses where it's interesting to ask about the half life of spam. Of course what I think of as a general category here is probably some number of different ones that I don't really see because I don't have enough exposure to information about how spammers harvest and pass around addresses today.
(My impression is that one reason old active addresses are so heavily spammed is that these old addresses have become pervasively and basically freely available to spammers via many paths. I assume that newer addresses are harder and more costly for spammers to get, so they are less pervasive. This is probably an incorrect assumption.)
The real thing this has made me realize that I don't really know much about how modern spammers operate. Is there a modern equivalent of the old 'million addresses' CDs that spammers apparently used to sell and pass around a decade ago, for example? I have no idea.
(I'm not likely to find out, either, since doing so would take a bunch of work even to find reliable sources of information and I just don't care enough any more. My spam problems have been basically solved by us outsourcing the work to commercial software.)
2013-10-16
Disused addresses and the impact of spam
Here's something that I've slowly realized, at least about myself: the annoyance and impact of a given volume of spam is disproportionately large on otherwise idle and disused email addresses. The less real volume the email address gets, the worse spam feels regardless of volume. I have some old email addresses that get perhaps one real email every six months and perhaps a spam email every week or two, and I find it teeth-grinding. Meanwhile I'd only find that (low) volume of spam irritating on my main address (which probably sees over a hundred email messages a day).
(In retrospect this is part of why I said postmaster addresses are dead.)
I've previously written about why I feel noisy addresses are dead and while I still agree with the low level mechanics of that, it's not what affects me here (one email message a week can't exactly be called 'noisy'). What I think it is is the ratio of spam to ham. On a relatively disused email address the amount of ham declines much faster than the spam rate while the reverse is true for most actively used email addresses. My disused accounts thus wind up with huge spam to ham ratios; almost everything they get is spam (but not quite everything, otherwise I could just set them to refuse all email).
Which leads me around to the (now obvious) conclusion that for me the annoyance of spam can come either from its volume (having to deal with spam very often irritates me) or from the ratio. If almost every time I see I have email somewhere my reaction is 'oh great, more spam', I'm not in a good place.
(I'm actually somewhat curious what the spam half-life of a disused email address is. I assume that spam volume drops off somewhat, for various reasons, but I have no idea how fast it drops.)
2013-09-29
Spammers illustrating, well, something
I'm on the general mailing list for Exim and over time I've become used to seeing what I'll call 'please do my homework' requests there. By this I don't quite mean literally requests for this (so far no one seems to assign configuring Exim as homework for some class), but instead requests for help with problems which clearly show that the requester has not attempted to help themselves and is turning to the mailing list as a last resort; instead the mailing list is serving as a first or second resort. This is probably completely normal for any open source software with a decent usage level and I've long since gotten over being particularly irritated about it.
Every so often, though, the mailing list sees an unusually spectacular instance. Such as this recent one:
Hello,
Working with sending e-mail marketing and I'm using cpanel / whm with exim in its latest version.
Need to optimize the shipping of exim, and to receive e-mail the same now send direct to the recipient, and that was not generating the send queue.
I have noticed that the server is sending queue accumulating and analyzing the logs shipping seems that ISPs are blocking recipients immediate receipt, claiming the high flow of sending e-mail, noting that 100 IPS rotacioando have every referral, this set RDNS, SPF and DKIM,
[....]
I suppose I shouldn't be too surprised. (Self-admitted) e-mail marketers need to send their email using something and sometimes they're going to use the same mailer that I do. And some of them are going to ask other people to do their homework without entirely thinking the whole thing through.
(This message was met with a resounding complete silence on the Exim mailing list, which restored a bit of my faith in humanity.)
2013-09-24
A semi-wish for an official 'null MX' standard
I've written before about how it seems that spammers will scrape anything that looks like an email address and then attempt to spam them. Of course, much of what gets scraped is probably for (alleged) hosts that don't have anywhere to send email, so the spammers never even get as far as sending it. But through some sort of luck I have such a host that doesn't accept email but also happens to have an accessible mail server running on its IP address (because that IP address handles email for some other things).
Since I watch my SMTP logs (it's a low-activity mail server) this has
given me a nice ringside seat to the spam attempts and helped add things
to my personal blacklist. But as entertainment this palls after a while
and I'm starting to reach the point where I don't care and I would
rather that all of the would-be spammers just go away. To do this I'd
like what gets called a 'null MX', an MX entry that says 'this thing
doesn't get email, don't even bother trying to talk to its IP address'.
To my surprise there is no official standard for this. There
is a widespread habit of using an MX to '.' (dot, the
root of the DNS hierarchy) but it's not actually a standard
(although it was first put forward as a draft RFC in 2005 and is being tried
again this year). In
theory this has been around long enough as customary practice that many
mail servers should support it; in practice I have no idea how well it
works. If it's not very effective at reducing incoming spam attempts I
might as well not add the entry at all. I suppose I actually have a
relatively good opportunity to conduct a slow-moving scientific
experiment to find out.
(Probably the most reliable way to do this is to set the MX to a
public IP address under your control that doesn't exist or doesn't
accept incoming SMTP. I wouldn't use a private IP address or a 127/8
address because both of those may be ignored by legitimate mailers while
the only thing that's going to ignore an unresponding public IP as an
MX is spamware that is deliberately trying your A record even though
an MX exists.)