Wandering Thoughts archives

2016-04-28

You should probably track what types of files your users get in email

Most of the time our commercial anti-spam system works fine and we don't have to think about it or maintain it (which is one of the great attractions of using a commercial system for this). Today was not one of those times. This morning, we discovered that some incoming email messages we were receiving make its filtering processes hang using 100% CPU; after a while, this caused all inbound email to stop. More specifically, the dangerous incoming messages appeared to be a burst of viruses or malware in zipped .EXEs.

This is clearly a bug and hopefully it will get fixed, but in the mean time we needed to do something about it. Things like, say, blocking all ZIP files, or all ZIP files with .EXEs in them. As we were talking about this, we realized something important: we had no idea how many ZIP files our users normally get, especially how many (probably) legitimate ones. If we temporarily stopped accepting all ZIP file attachments, how many people would we be affecting? No one, or a lot? Nor did we know what sort of file types are common or uncommon in the ZIP files that our users get (legitimate or otherwise), or what sort of file types users get other than ZIP files. Are people getting mailed .EXEs or the like directly? Are they getting mailed anything other than ZIP files as attachments?

(Well, the answer to that one will be 'yes', as a certain amount of HTML email comes with attached images. But you get the idea.)

Knowing this sort of information is important for the same reason as knowing what TLS ciphers your users are using. Someday you may be in our situation and really want to know if it's safe to temporarily (or permanently) block something, or whether it'll badly affect users. And if something potentially dangerous has low levels of legitimate usage, well, you have a stronger case for preemptively doing something about it. All of this requires knowing what your existing traffic is, rather than having to guess or assume, and for that you need to gather the data.

Getting this sort of data for email does have complications, of course. One of them is that you'd really like to be able to distinguish between legitimate email and known spam in tracking this sort of stuff, because blocking known spam is a lot different than blocking legitimate email. This may require logging things in a way that either directly ties them to spam level information and so on or at least lets you cross-correlate later between different logs. This can affect where you want to do the logging; for example, you might want to do logging downstream of your spam detection system instead of upstream of it.

(This is particularly relevant for us because obviously we now need to do our file type blocking and interception upstream of said anti-spam system. I had been dreaming of ways to make it log information about what it saw going by even if it didn't block things, but now maybe not; it'd be relatively hard to correlate its logs again our anti-spam logs.)

KnowingAttachmentTypes written at 01:36:06; Add Comment

2016-04-19

Today's odd spammer behavior for sender addresses

It's not news that spammers like to forge your own addresses into the MAIL FROMs of the spam that they're trying to send you; I've seen this here for some time. On the machine where I have my sinkhole server running, this clearly comes and goes. Some of the time almost all the senders will be trying a legitimate MAIL FROM (often what they seem to be trying to mail to), and other times I won't see any in the logs for weeks. But recently there's been a new and odd behavior.

Right now, a surprising number of sending attempts are using a MAIL FROM that is (or was) a real address, but with the first letter removed. If 'joey@domain' was once a real address, they are trying a MAIL FROM of 'oey@domain'. They're not just picking on a single address that is mutilated this way, as I see the pattern with a number of addresses.

(Some of the time they'll add some letters after the login name too, eg 'joey@domain' will turn into 'oeyn@domain'.)

So far I have no idea what specific spam campaign this is for because all of the senders have been in the Spamhaus XBL (this currently gets my sinkhole server to reject them as boring spam that I already have enough samples of).

What really puzzles me is what the spammers who programmed this are thinking. It's probably quite likely that systems will reject bad local addresses in MAIL FROMs for incoming email, which means that starting with addresses you think are good and then mutating them is a great way to get a lot of your spam sending attempts rejected immediately. Yet spammers are setting up their systems to deliberately mutate addresses and then use them as the sender address, and presumably this both works and is worthwhile for some reason.

(Perhaps they're trying to bash their way through address obfuscation, even when the address isn't obfuscated.)

(I suspect that this is a single spammer that has latched on to my now spamtrap addresses, instead of a general thing. Our general inbound mail gateway gets too much volume for me to pick through the 'no such local user' MAIL FROM rejections with any confidence that I'd spot such a pattern.)

DropFirstLetterSpammers written at 01:06:35; Add Comment

2016-04-10

SPF is not a security feature, as it solves the wrong problem

SPF is one of my hot button issues, or rather how all too often influential people seem to think that SPF is a good idea. A lot of the time these people seem to think that a hard-fail SPF policy is a security feature, something that will prevent forgery of email as being from their company or organization. These people are wrong, at least in any practical sense.

The problem with SPF as a security feature is that it protects the wrong thing. To the extent that it does anything, SPF protects the (SMTP) envelope sender, ie the MAIL FROM domain, and the envelope sender is effectively invisible to people reading their email. I am an email expert and even I do not configure my mail client to display the envelope sender; like everyone else, I see the From: header. Ordinary people generally don't even know that a separate envelope sender address even exists.

What this means is that an attacker who wants to forge email from your domain is not at all deterred by your hard-fail SPF policy. They just put something else in the envelope sender, put your domain in the From:, and mail away. It's extremely unlikely that anyone will notice anything or that any automated systems will lower the reputation score of these forged email messages (at least for that reason). And I'm being extremely generous here, since I'm assuming that people even see or look at the domain of the From: address, as opposed to simply seeing some user-friendly version of it that may be based on, eg, the name in the From: instead of the domain.

(For example, GMail will show you the domain of the From: but it seems to de-emphasize it, using smaller type in a lighter shade compared to the person's display name. If people aren't already suspicious, how likely are they to notice a mismatch in such a thing?)

If you want a security feature that tries to block people forging your domain in a meaningful sense, you want DMARC. DMARC specifically exists to protect the From: domain and in the process the integrity of your legitimate email, so that it can't be either forged or altered. SPF has nothing to do with this. Of course even preventing forged From: domains is not a great protection, but at least DMARC does something useful with only moderate collateral damage, unlike hard-fail SPF.

(SPF does not really solve any problem, especially these days. The one problem it might solve it doesn't because lots of MTAs sensibly ignore it. See the sidebar here and of course SPF also has major downsides.)

SPFNotSecurityFeature written at 03:08:31; Add Comment

2016-03-20

What broad hit rate the Spamhaus DBL might get for us

I took the past 9 days worth of logs from our commercial anti-spam black box, extracted the 'spam score' it assigns and the envelope sender domain, split this into three categories based on the broad scores from 0 to 100 that the system assigns, and then checked all of those origin domains against the Spamhaus DBL.

(Because of how our overall anti-spam systems work, this excludes some but not all of the email from hosts that are in Spamhaus's IP based lists.)

Based partly on previous stats and how we use the spam scores ourselves, my three categories were 'definitely spam' (scores of 98 to 100), 'enough to be spam' (scores of 60 through 97), and 'probably not spam' (below 60). The raw numbers are:

  • for 'definitely spam', 5,452 different MAIL FROM domains and only 812 in the DBL; a 14% hit rate.

  • for 'enough to be spam', 4,118 different domains and 1,744 in the DBL; a 42% hit rate.

  • for 'probably not spam', 5,268 different domains and 20 in the DBL.

At one level, this is actually reassuring; it suggests that our commercial black box is doing a reasonably good job of finding much of the actual spam, even though it missed some things.

(It also suggests that the black box is not already including the DBL, or at least if it does it doesn't weigh the envelope sender very high in its scoring. Otherwise those 20 domains wouldn't be there.)

The relatively low domain hit rate on the 'definitely spam' category is at least partly due to the fact that there are a lot of domains in that set that were not used for very many messages to us. In fact the median usage count for domains there is one. If I go through the effort to count DBL hits by usage, it comes out to 44% of the actual messages had sender domains in the DBL.

The usage based hit count for the 'enough to be spam' category comes out to be slightly higher; there 54% of the actual messages had sender domains in the DBL.

(As you might expect, the 'probably not spam' category doesn't improve when measured by actual usage. Percentage wise it goes way, way down, in fact, as not very many messages came from those DBL-listed domains.)

All of this means that I should definitely look at using the DBL in our overall anti-spam setup, because using the DBL would enable early rejection of a significant amount of spam that otherwise makes it as far as relatively expensive spam scoring.

SpamhausDBLEstimatedHitRate written at 03:00:25; Add Comment

2016-03-18

The Spamhaus DBL does get hits even with basic checks

The Spamhaus DBL is unlike their other blocklists in that it is for host and domain names, not IP addresses. As Spamhaus describes it:

The Spamhaus DBL is a realtime database of domains (typically web site domains) found in spam messages. Mail server software capable of scanning email message body contents for URIs can use the DBL to identify, classify or reject spam containing DBL-listed domains.

The intended primary use of the DBL is for message body scanning; you'd identify the hosts mentioned in URLs or URL-like things and run them past the DBL. You can also use it to check hostnames that appear in envelope information, like MAIL FROM (and EHLO, and simply the DNS name), but the way Spamhaus has written it up suggests that this is not going to get very many hits.

(The DBL is not the only such domain based blocklist, of course.)

A while back I added DBL checking to my sinkhole SMTP server and then turned it on, checking all of the MAIL FROM domain, the EHLO name, and the reverse DNS of the connecting IP. I didn't really expect it to get any hits; I basically wanted to experiment. The result contained two surprises.

The first surprise was that even in my modest little context, I see more than a few DBL hits. It's nowhere near the level of the SBL in general (especially the SBL CSS), which I check first, but it does happen enough that it's easy to find rejections that are due to it. This suggests that I should look into using the DBL along side the SBL in our real mail system's spam filtering.

(I want to do some actual analysis there, but that'll be another entry.)

The second surprise is that a lot of the mail senders using DBL listed domains were and are sending from their own servers, and those servers were not listed in the SBL or in fact any of Spamhaus's IP based DNSBLs. Often these people seem to have been sending from the same IP address for quite a while. This is very much not what I expected; I expected that if you were a DBL listed operation, your sending servers would wind up listed in the SBL in short order for, well, sending spam. Instead I see a number of persistent DBL-listed senders with their own static server IPs who are (still) not SBL listed.

(Often the IP addresses aren't even on very many other DNS blocklists, at least out of the ones that I check these days.)

This matters to me because one of the reasons I expected the DBL to have a low (additional) hit rate for things like MAIL FROM checks was that I thought there would be a much bigger overlap between the SBL and the DBL than there is. This expectation of low hit rates is why I haven't really looked even simple DBL usage before now.

(The moral, obviously, is to validate my anti-spam feelings instead of just assuming. A more general moral is that I should think about general infrastructure for doing experiments to measure potential hit rates on things like this. Some amount of things can be looked at in retrospect based on logs, but not everything.)

SpamhausDBLDoesGetHits written at 22:20:21; Add Comment

2016-02-28

The status of null-sender spam from outlook.com

Recently, David left a comment on my last entry on null sender spam from outlook.com noting that his site had seen a stop of null sender spam from Outlook at the end of December. This made me curious about what we're seeing (and David asked, too), so I've now gone looking.

The short version is that clear null sender spam from outlook.com appears to have stopped at the end of last year (and I mean literally the end of last year, as we have entries from December 31st). We're still getting some amount of email from outlook.com with null sender addresses, but our anti-spam system now scores all of it very low. I can't be sure that this isn't spam, but it's certainly entirely possible that it's real bounces. We continue to get spam from outlook.com in general; at the moment, our 2016 figure is that about 4% of email from outlook.com scores high enough to be considered spam. In December the logs say it came out to be about 11.5% spam, so we clearly saw a significant drop here.

David also reported a lack of general spam from outlook.com. Unfortunately we don't see that. Outlook.com has been consistently sending us some amount of spam (as scored by our systems). In addition, several outlook.com hosts are currently on the SBL; out of microsoft's listings, I can spot more than five listings. However the SBL seems to be doing something odd here, in that they're listing .0 addresses in the /24 instead of the actual IP address they list in the SBL listings. The net effect is that nominal SBL listings won't actually block anything, which kind of irritates me.

(Eg, SBL273948 says 'Spam source @104.47.100.234' but is for 104.47.100.0/32.)

My overall view is that outlook.com continues to have a spam problem, but they have apparently managed to block or otherwise stop one source of their spam. This is progress; it is just not enough progress. Having roughly one in twenty email messages that we receive from you being spam is not a good ratio. For scale, over the same period in 2016, only 0.2% of the email received from Google was scored as spam.

(This includes both GMail email and email from some other things at Google that send out email, since as far as I know you can't tell the email servers apart, assuming there even is different infrastructure for the various different email systems.)

OutlookNullSenderStatus written at 02:33:51; Add Comment

2016-02-15

SMTP submission ratelimits should have delays too

We've recently enabled ratelimits on our mail submission servers. By this I mean that when you hit the ratelimits (which are applied to recipients, ie RCPT TO commands), the mailer gives you 4xx replies and your client will probably error out. But when I set up the ratelimits, I didn't just do that; I also made the mailer delay for 10 seconds before responding to every ratelimited RCPT TO. It's my view that this is a useful and in fact important thing to do.

You should normally only have ratelimits on submission in order to deal with spammers, which means that when the ratelimits trigger you hopefully have a spammer. One of the things you want to do with spammers is slow them down; the slower the spammer is going, the less messages they're submitting and the less damage they're doing. If you immediately reject RCPT TOs, you're slowing down the spammer in one sense (their message isn't getting through to very much), but not in another sense; their submission client will get 4xxs at a rate of many a second and finish message submission almost immediately. They can try again immediately or switch over to another compromised account they have or whatever. Delaying before each reply slows the spammer's submission client down, possibly significantly. If you delay for 10 seconds (as we do), now the spammer can only test 6 RCPT TOs a minute. Unless their code abandons message submission when it goes too slowly, you're tying up their submission code for tens of minutes for a typical spam message with a mass of recipients, instead of letting them dump the message on you and be rejected in a second or so.

(The other benefit of slowing the spammer down is that it gives your alerting mechanisms more time to light up with alarms and get your attention.)

Now, it's certainly possible that this frustrates spammers less than I think. Perhaps their software abandons message submission on the first 4xx reply and immediately moves on (including abandoning the current account). But on the other hand it's a cheap precaution and I think it's worth a shot.

(A spammer that does many message submissions in parallel can speed things up here, but then they open themselves up to other limiters. It's almost certainly rather unusual to have multiple mail submission sessions running at the same time, especially once ratelimiting has kicked in.)

RatelimitsWithDelays written at 00:43:03; Add Comment

2016-02-14

Your outgoing mail system should have a per-sender stop switch

Here is something important we have come around to realize as one result from recent events. Put simply, every system that handles outgoing user-generated email should have some method to immediately stall and stop all email from a specific user. You want this for the obvious reason; when you discover you have a compromised user account that's being used to send spam, you can immediately stop just their email instead of having to take down your entire outgoing email system.

When you implement this, don't just implement the obvious step of refusing email being submitted by blocked user(s). Go the extra distance so that blocking a user also (immediately) stops further processing of any email from them that you have in the mailer's queues. Generally, by the time you detect that a user's been compromised and their account is being used to spam, you're going to have a bunch of email from them queued up in your system trying to get delivered to various places. You really don't want to have to hunt all of this email down by hand to stop it from being sent out; instead, it's much more better if you put a login name or whatever in a control file and all the queued email from them stops dead, with no fuss or muss.

(I wouldn't automatically remove such email from the queue, partly because you may want to inspect sample messages. It's enough that the messages stop trying to be delivered without you having to stop the entire mail system in a panic.)

Usually you'll want to do this based on the authenticated user (generally from SMTP AUTH). In some environments people don't have to authenticate to your outgoing mail server; here the best you can do is base things on, eg, the MAIL FROM. If you're dealing with this situation, you may want to support wildcards (so you can say 'all email with a Hotmail sender address gets stopped'). Spammers often but not always use revolving MAIL FROM addresses where they think they can get away with it, so you need to be prepared for that.

(You may also want to support a per-IP or per-subnet stop switch, especially if you don't have SMTP AUTH to attach reliable identities to submitted email. But things are starting to get intricate here and at some point you're better off just stopping the entire mail system and doing general searches through the queued email.)

OutgoingSenderStopSwitch written at 02:14:28; Add Comment

2016-02-12

We need to deploy anti-spam precautions even if they're a bit imperfect

A few years ago we had a local spam incident. In its wake, we made some configuration changes and started exploring things like ratelimiting outgoing email. Our first step in this was to set our Exim configuration to track rate limits without enforcing them, so that we could figure out what limits to set that would stop spammers without causing problems for our users.

At one level, this was a sensible decision. Causing disruptions to our users might create political pressure that would stop us from taking any precautions against future spam runs from compromised local accounts. But, well, we never really found a level that our users didn't run over once in a blue moon, and when the overruns only happen once in a blue moon it's hard to iterate on tuning limits. And so things sat there from 2012 until today.

Today we had another little local spam incident (as people on Twitter might have guessed). Our ratelimit tracking code dutifully logged that this was happening and that our hypothetical ratelimits were being exceeded, and by quite a bit too:

[...] Warning: SENDER RATE LIMIT HIT: 27943.5 / 60m max 200 / [...]

So. Yeah.

It may not surprise you to hear that now we have some active ratelimits; in fact we more or less simply made our previous tracking ratelimits into enforcing ones. They're undoubtedly not perfect ratelimits, and in fact I'm fairly sure that within six months someone here sending out an entirely legitimate burst of email will run into them. But as usual the perfect is the enemy of the good. Our quest to deploy only perfect anti-spam precautions that would never inconvenience our users turned out to result in us deploying almost no anti-spam precautions, with regrettable results.

(Nor did we avoid inconveniencing users, since some of them had email bounce due to the machine in question temporarily picking up a bad sender reputation.)

We don't want to deploy significantly imperfect anti-spam precautions, for obvious reasons. Something that gets in the way of our users on a frequent basis is no good. But I've come around to the view that we need to be more willing to deploy things that are a bit imperfect and then sort out the problems when they happen. Otherwise, well, we may be looking at something like this happening all over again.

(One of those may be some sort of scanning of our outgoing email, or at least some of it. Despite my historical reservations, I now think it's possible to do this in a good way and I think that the risks of false positives may be one of those 'a bit imperfect' things we can live with, at least initially. But right now I'm kind of thinking out loud in the immediate aftermath of an incident, which gives me some biases.)

DeployImperfectAntispamPrecautions written at 23:54:33; Add Comment

2016-01-08

Getting to watch a significant spam campaign recently

One of the interesting side effects of running a sinkhole SMTP server and occasionally looking at the SMTP command logs is that every so often I get to see the signs of what is clearly a significant spam campaign. Recently, for example, I noticed a whole pile of delivery attempts that all had a distinct signature, sufficiently distinct that I'm pretty sure they must have been from the same software and party.

The primary signature was an unusual MAIL FROM, where it was the same as the RCPT TO. A typical session looked like:

EHLO host11.190-230-18.telecom.net.ar
250 [...]
MAIL From:<ADDR@hawkwind.utcs.toronto.edu>
550 [...]
RCPT To:<ADDR@hawkwind.utcs.toronto.edu>
503 Out of sequence command

(My server advertises PIPELINING, so this run-ahead behavior by the client is legitimate. Not all of the connections did it, so I can't be entirely sure that they were going to RCPT TO the same address. It's a good bet, though; spammers seem to almost never attempt a MAIL FROM of my own domain.)

Almost all of the hosts that I saw do this were in the PBL, the XBL, or the CSS. Hosts EHLO'd with either their reverse DNS or with eg '[39.112.245.8]' when they had no rDNS (although not all of the names had forward DNS to go with their rDNS). While this was happening, I often saw a significant number of these connections one after another from all sorts of different IPs.

A few messages of this sort got all the way to DATA and so had their contents logged. Based on that, the campaign seems to have been pushing an offshore pharmacy hosted on an IP that Spamhaus lists as part of 'Yambo Financials' aka 'RxMed pharma spam website hosting' (although the domain name used in the spam is not one that's currently in the SBL listing). That doesn't really surprise me, as I'd expect such a spam campaign to come from one of the larger operations.

There are probably spam campaigns running all the time that my (now) spamtraps get hit by. It's just that usually they don't stand out this much, either by having a distinctive and unusual signature or by hammering on my addresses quite this hard. The latter puzzles me a bit, since it seems inefficient (and I do believe that spammers are generally efficient).

FromTargetSpamRun written at 00:52:13; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.