Wandering Thoughts

2016-07-27

When 'simple' DNS blocklists work well for you

I've written about how we can divide DNS blocklists into 'simple' and 'complex' ones, where simple DNSBLs basically list things based on them sending spam or other bad stuff without trying to do more complex things like assess how much legitimate traffic also comes from the source. To put it one way, if a DNSBL lists one of GMail's outgoing SMTP servers because it sent some spam, it's almost certainly a simple one. I also said that rejecting email based on a simple DNSBL isn't necessarily a mistake, so it's time to explain that.

Suppose that you have a mail system that generally receives a low volume of legitimate email; for example, you might be operating a personal email server. Suppose that you also start getting spam. Spammers almost never go away, so your spam volume is very likely to trend up over time and reach a point where most of your incoming email is spam. In this environment, a listing in a simple DNSBL is a fairly strong confirmation signal that this new email is really spam. It's much likely that you're getting spam email from an IP that's been detected as spamming than that an innocent person has chosen to send you legitimate email from an IP that also sent spam and got listed in the DNSBL. The latter could happen, but the odds are low.

We've sort of seen this before. If the legitimate email rate is low and the DNSBL's 'false positive' rate on it is also low, the odds that a positive signal from the DNSBL means that an email is spam is very high. You can make the odds even higher by whitelisting known good sources.

(Of course anti-spam precautions aren't evaluated purely on percentages; the absolute number of legitimate messages blocked matters. Here the low volume helps, as there just aren't that many legitimate emails to get blocked.)

Similar logic can be applied to a lot of anti-spam heuristics; many things look good when they're dealing with a stream of email that's mostly or almost entirely spam. Block on bad EHLO greetings? Sure, why not, especially since GMail and the other big people do generally get those things right.

(GMail will send you spam too, of course, but statistically a new legitimate sender is much more likely to be using GMail or one of the other big places than an email server in the middle of nowhere. And yes, there are downsides to too many people adopting this sort of attitude to both heuristics and new mail sending machines in surprising places; ask anyone trying to send personal email from a new small home mail server and get it accepted by places.)

WhenSimpleDNSBLsWork written at 01:09:35; Add Comment

2016-07-06

It turns out that viruses do try to conceal their ZIP files

One of the interesting things that happens when you start to log information about what types of files your users get in email is that you get to discover certain sorts of questionable things that people actually do ('people' in a loose sense). Here's one interesting MIME part, extracted from our logs:

attachment application/octet-stream; MIME file ext: .jpeg; zip exts: .js

The 'attachment' bit is the Content-Disposition and the nominal MIME type comes from the Content-Type. The MIME filename (which came either from Content-Type or Content-Disposition) had a .jpeg extension; however, our logging program found that the attachment actually was a ZIP file with a single .js file inside it, not a JPG image. Our anti-spam software later identified it as malware.

(I didn't set out to write an attachment type logging program that did content sniffing, but the Python zipfile module has a very convenient function for it and it's much simpler to structure the code that way instead of trying to maintain a list of file extensions and/or Content-Types that correspond to ZIP files.)

I vaguely knew that any number of file formats were actually ZIP files under the hood; there's .jar files, for example, and a number of the modern '* office' suites and programs use ZIP as their underlying format. Our file type logging program has peered inside any number of those sorts of attachments (as well as inside regular .zip attachments). I also knew that it was theoretically possible for bad actors to try to smuggle ZIP files through as some other file type. But I didn't expect to see it, especially so fast.

(To be fair, most malware does seem to stick to .zip files, not infrequently even with real MIME Content-Types. I suspect that malware wants to make it easy for people to open up the bad stuff that it sends them.)

PS: Hopefully no real content filtering software is fooled by this sort of transparent ruse. It's not as if ZIP archives are hard to detect. Sadly, that (some) malware does this kind of thing makes me suspect that some important software actually is defeated by it.

PPS: All of the cases seem to be from the same malware run, based on how they all happened today and have various other indicators in common.

VirusesDoConcealZipFiles written at 01:31:39; Add Comment

2016-06-30

What makes a email MIME part an attachment?

If you want to know what types of files your users are getting in email, it's likely that an important prerequisite is being able to recognize attachments in the first place. In a sane and sensible world, this would be easy; it would just be any MIME part with a Content-Disposition header of attachment.

I regret to tell you that this is not a sane world. There are mail clients that give every MIME part an inline Content-Disposition, so naturally this means that most mail clients can't trust an inline C-D and make their attachment versus non-attachment decisions based on other things. (I expect that there are mail clients that ignore a C-D of attachment, too, and will display some of those parts inline if they feel like it, but for logging we don't care much about that.)

MIME parts may have (proposed, nominal) filenames associated with them, from either Content-Type or Content-Disposition. However, neither the presence nor the absence of a MIME filename determines something's attachment status. Real attachments may have no proposed filename, and there are mail clients that attach filenames to things like inline images. And really, I can't argue with them; if the user told you that this (inline) picture is mydog.jpg, you're certainly within your rights to pass this information on in the MIME headers.

The MIME Content-Type provides at least hints, in that you can probably assume that most mail clients will treat things with any application/* C-T as attachments and not try to show them inline. And if you over-report here (logging information on 'attachments' that will really be shown inline), it's relatively harmless. It's possible that mail clients do some degree of content sniffing, so the C-T is not necessarily going to determine how a mail client processes a MIME part.

(At one point web browsers were infamous for being willing to do content sniffing on HTTP replies, so that what you served as eg text/plain might not be interpreted that way by some browsers. One can hope that mail clients are more sane, but I'm not going to hold my breath there.)

One caution here: trying to make decisions based on things having specific Content-Type values is a mug's game. For example, if you're trying to pick out ZIP files based on them having a C-T of application/zip, you're going to miss a ton of them; actual real email has ZIP files with all sorts of MIME types (including the catch-all value of application/octet-stream). My impression is that the most reliable predictor of how a mail client will interpret an attachment is actually the extension of its MIME filename.

(While the gold standard for figuring out if something is a ZIP file or whatever is actually looking at the data for the MIME part, please don't use file (or libmagic) for general file classification.)

One solution is certainly to just throw up our hands and log everything; inline, attachment, whatever, just log it all and we can sort it out later. The drawback on this is that it's going to be pretty verbose, even if you exclude inline text/plain and text/html, since lots of email comes with things like attached images and so on.

The current approach I'm testing is to use a collection of signs to pick out attachment-like things, with some heuristics attached. Does a MIME part declare a MIME filename ending in .zip? Then we'll log some information about it. Ditto if it has a Content-Disposition of attachment, ditto if it has a Content-Type of application/*, and so on. I'm probably logging information about some things that mail clients will display inline, but it's better than logging too little and missing things.

(Then there is the fun game of deciding how to exclude frequently emailed attachment types that you don't care about because you'll never be able to block them, like PDFs. Possibly the answer is to log them anyways just so you know something about the volume, rather than to try to be clever.)

KnowingWhatIsAnAttachment written at 01:03:10; Add Comment

2016-06-27

If you send email, don't expect people to help you with abuse handling

I'll start with the tweets:

@thatcks: I see these spammers used @MailChannels to hit us once before, in April. I reported them then, but I have no time for this shit any more.

Back in April, a persistent long-term spammer of one of our addresses attempted to send it spam via MailChannels, a commercial email sending outfit. I complained to MC's abuse contacts at the time, because I'm an optimist, and someone at MC got back to me to tell me this spammer had been fixed. Then they came back now (well, a couple of days ago).

@thatcks: As has been said many, many times before, expecting the receivers of email to be your anti-spam detection method is utterly broken.

Some people might say that I should do the 'responsible' thing and once again report this incident to MailChannels. These people are wrong. It is always the sender's responsibility to detect that they are sending spam and take steps to deal with it; as has been said many years ago, abuse reports are a gift (one that comes from fewer and fewer people these days). In my case, my only real interest is in making the spam stop and generally I have far more effective ways of doing this than sending in complaints.

(By the way, I hope we can agree that there is absolutely no moral basis for saying that people have a responsibility to report spam. If your service is spamming me, I am getting absolutely nothing out of this and I accordingly owe you absolutely nothing. In fact, morally speaking you owe me for inflicting costs on me.)

In this specific situation, it's also clear that sending in complaints is not effective (cf). After all, I already did that once, got an assurance that it was dealt with, and the spammer came back a couple of months later. A repeat report is likely to net exactly the same result at best.

Then MailChannels popped up:

@MailChannels: @thatcks We don't take abuse of our network lightly and are keen to investigate. Please send us sample messages to support@mailchannels.com

This is a form tweet. It betrays at least an inability to read my original message.

(Replying to aggravated people with form tweets that betray a lack of thinking human involvement is, at the least, going to aggravate them further. So it proved here.)

@thatcks: .@MailChannels You're asking me to do more work to help you out. Why would I do that? If you want, you have enough information already.

I gave the form tweet all the response I felt that it deserved. And it's true that MailChannels has all the information they need; they could just search their April abuse reports for my name, find the address here that I reported was hit, and see if that address was sent to recently. Why yes, yes it was. MailChannels' email to it was even rejected this time around too, which really ought to be one of a number of danger signs for MailChannels. Certainly this would take some work on MailChannels' part, but you know, they're the people that this benefits, not me; I've already taken effective steps on our side.

(MailChannels benefits because they get rid of a spammer who may drag their reputation down and damage the deliverability of email for other paying customers, which would cost MailChannels money.)

Of course, I expect that MailChannels did nothing here. That's the easy way to blow off problem indicators while feeling good about yourself; you can say 'well, if it was real the person would have totally taken us up on our offer'. They can tick off the 'we tried' box and consider the matter done. And really, what mail sending service can afford to actually do a good job with spam?

(Applications of this pattern to, say, bug reports and bug trackers are left as an exercise for the reader.)

DontExpectAbuseHelp written at 00:58:13; Add Comment

2016-06-12

There are (at least) two sorts of DNS blocklists

Here is a trite and obvious thing that I never the less feel like writing down: in practice, there are (at least) two sorts of DNS anti-spam blocklists. Since I want to use value neutral terms here, let us call these 'simple' and 'complex' blocklists.

The operation of a simple DNSBL is, well, simple. If it sees spam from an IP, it lists the IP (or if it sees whatever is the DNSBL's idea of 'bad stuff'). Usually the IP gets automatically delisted after a while, but in some DNSBLs the listing lasts forever unless someone takes action to have it get cleared, appealed, or whatever.

A complex DNSBL attempts to have a more complex balancing criteria for adding listings than simple presence of spam; for instance, it may somehow assess how much apparently legitimate traffic it's seen from the source IP as well as spam volume. A complex DNSBL is sometimes going to be slower to list an IP than a simple DNSBL.

A simple DNSBL does not have 'false positives' as such (assuming that it's honestly run), but that's because a listing means something very narrow; it means that the IP did a bad thing within the time horizon. People who reject email based on a listing in a simple DNSBL may have false positives in that rejection, though, because an IP doing a bad thing once doesn't necessarily mean that it will do it every time. Complex DNSBLs can have false positives because they're fundamentally intended to assert that the email you're getting is probably bad. Good operators of complex DNSBLs attempt to minimize such false positives.

To give an example of each, the Spamhaus SBL is a complex DNSBL (or at least generally it is). The CBL is a simple DNSBL, but one that (theoretically) uses a very narrow listing criteria that is very strongly correlated with sending only spam.

Unfortunately not all DNSBLs make it clear what sort of DNSBL they are in their description (or sometimes they wave their hands about it a bit). At least at the moment, one quite strong signal that you are dealing with a simple DNSBL is if it ever lists one of GMail's outgoing mail servers.

(I feel that rejecting email based on a simple DNSBL is not necessarily a mistake, but the sidebar attempting to explain this got long enough that it's going to be another entry.)

DNSBLsTwoSorts written at 23:36:10; Add Comment

2016-06-02

Spammers can abandon SMTP connections not infrequently

As a result of looking at my SMTP session logs, one of the things that I've started tracking on my 'sinkhole' spamtrap SMTP server is how many senders reach the point where they actively get rejected by my server versus how many senders just disconnect with incomplete sessions where everything has gone fine up to that point. My SMTP session logging said that at least some just gave up, but I wasn't sure how many did this.

(Under normal circumstances you'd expect real sending mailers to almost never just abandon an incomplete session. It's not 'never' because there will always be some sending mailers that have their machine reboot out from underneath them or the like as they're trying to send out a message, but this is not exactly common so it should be very low.)

My results so far are early and somewhat incomplete, but I'll give you representative numbers anyways. The numbers I have handy right now are that over the past two and a half days, I've seen 123 abandoned sessions to 440 sessions with refused SMTP commands, or about a fifth of the sessions are just being abandoned. I don't particularly have data on where the sessions are being abandoned, but looking at my SMTP logs say that some senders drop the connection while I'm sending my initial SMTP greeting banner and some drop it as I answer their EHLO or HELO.

Now, I don't and can't know why senders are choosing to abandon their SMTP sessions to my sinkhole server. But one thing that my server does is trickle out its SMTP replies rather slowly (including the initial banner), specifically at a rate of one character every tenth of a second. I took this idea from OpenBSD's spamd, but when I put it in I didn't really expect it to do anything. It may be that I'm wrong here and there is a not insignificant amount of spammer software that either specifically recognizes this behavior or simply isn't interested in wasting its time on too-slow mailers.

(I don't yet feel like experimenting by turning this feature off and seeing if the number of abandoned sessions basically goes almost to zero.)

Applications of this to real, non-sinkhole mailers are left as an exercise. As far as I know, no real sending mailer cares about somewhat slow responses at this level, but I admit I haven't exactly attempted to get every major ISP and so on to send my sinkhole server some email just to see if it would work. Big places like Google and Outlook don't seem to have had any problems coping with my sinkhole server, for what that's worth.

Sidebar: what I consider an abandoned session versus a rejected one

A session counts as 'rejected' if the most recent valid HELO/EHLO, MAIL FROM, RCPT TO, DATA or final '.' on messages was either 5xx'd or 4xx'd. This doesn't consider QUIT, RSET, or other similar commands and it doesn't consider out of sequence commands. A session counts as 'abandoned' if it got 'go ahead' 2xx/354 responses to every valid, in-sequence SMTP command it tried but the sender either closed the TCP connection or sent a QUIT.

Sessions with things like TLS setup failures don't count as either abandoned or rejected. I see some amount of those, some for sad reasons.

SpammersAbandonSMTPSessions written at 00:17:45; Add Comment

2016-05-27

Your overall anti-spam system should have manual emergency blocks

We mostly rely on a commercial anti-spam system for our incoming spam filtering (as described here), and many other people rely on a variety of open source options for their spam filtering. This generally works very well, with us (and you) getting to offload the work of maintaining a high quality anti-spam system to other people (and it's certainly a lot of work). But not always (and not just because it malfunctions). The realities of life are that sooner or later you will be hit by a spam run that your anti-spam system doesn't recognize, either because the spam run is really new or because it's pretty specific to you.

Much of the time, you can shrug your shoulders and let this go. No anti-spam system is perfect and one of the tradeoffs you make when relying on a third-party system is that it's broadly out of your hands (sometimes this is an advantage). But some of the time this isn't going to be good enough; either the volume or the threat to your users will be so high that you can't just sit on your hands.

(Modern ransomware is making this clear by creating a potentially very high cost of allowing some things through.)

When this day comes to pass, you'll want to have the ability to step in and block the traffic even though your automated anti-spam system is happy with it. This can take many forms, depending on how you want to handle it; you could figure out how to write custom rules for your anti-spam system (so you can outright block certain sorts of files or certain URLs or whatever), or you can build blocking features into your mailer configuration itself, or any number of other options.

Having been through having to do this on the fly during an emergency, my strong suggestion is that you build the infrastructure for these manual blocks now, before you need them. It's some additional up front work and if you're lucky you may never need it, but doing it now when you have time to plan and test and figure out the best way to do things beats having to do it on the fly, under pressure.

Sidebar: What I think you should have manual blocks for

On the one hand attacker ingenuity is very deep, but on the other hand certain patterns repeat over and over again. So my view is that you can probably cover most ground with the ability to put in place manual blocks against sending IPs, sending domains, file extensions (including inside file containers like ZIP files), and whole and partial URLs (for phishing campaigns). You might also want a general message header and body regular expression matching system, but that's starting to feel like scope creep to me.

(Of course real scope creep would be to start by creating a general, generic framework for writing relatively arbitrary manual blocks on message attributes.)

PlanForManualSpamBlocks written at 01:43:55; Add Comment

2016-05-22

My view of Barracuda's public DNSBL

In a comment on this entry, David asked, in part:

Have you tried the Barracuda and Hostkarma DNSBLs? [...]

I hadn't heard of Hostkarma before, so I don't have anything to say about it. But I am somewhat familiar with Barracuda's public DNSBL and based on my experiences I'm not likely to use it any time soon. As for why, well, David goes on to mention:

[...] Barracuda in particular lists more aggressively and is willing to punish lower volume relays that fail to mitigate spammer exploitations. [...]

That's one way to describe what Barracuda does. Another way to put it is that in my experience, Barracuda is pretty quick to list any IP address that has even a relatively brief burst of outgoing spam, regardless of the long term spam-to-ham ratio of that IP address. Or to put it another way, whenever we have one of our rare outgoing spam incidents, we can count on the outgoing IP involved to get listed and for some amount of our entirely legitimate email to start bouncing as a result.

As a result I expect that any attempt to use it in our anti-spam system would have far too high a false positive rate to be acceptable to our users. Given this I haven't attempted any sort of actual analysis of comparing sender IPs of accepted and rejected email against the Barracuda list; it's too much work for too little return.

My suspicion is that this is likely to be strongly influenced by your overall rate of ham to spam, for standard mathematical reasons. If most of your incoming email is spam anyways and you don't often receive email from places that are likely to be compromised from time to time by spammers, its misfires are not likely to matter to you. This does not describe our mail environment, however, either in ham/spam levels or in the type of sources we see.

(To put it one way, universities are reasonably likely to get one of their email systems compromised from time to time and we certainly get plenty of legitimate email from universities.)

On my personal sinkhole spamtrap, I could probably use the Barracuda list (and the psky RBL) as a decent way of getting rid of known and thus probably uninteresting source of spam in favour of only having to deal with (more) interesting ones. But obviously this spamtrap gets only spam, so false positives are not exactly a concern. Certainly a significant number of recently trapped messages there are from IPs that are on one or the other lists (and sometimes both), although obviously I'm taking a post-facto look at the hit rate.

BarracudaDNSBLView written at 00:55:33; Add Comment

2016-05-19

Some basic data on the hit rate of the Spamhaus DBL here

After my previous exploration of the Spamhaus DBL, I wound up adding it as another DNS blocklist in our overall spam filtering setup. Because we don't have a mandate for it, none of our DNS blocklists apply to all email, only to email for people who have opted in to some amount of server side spam filtering. Because the DBL applies on a per-recipient basis, the comparison I'm going to use here is against the overall recipient count (not the overall message count). I'm also going to use the past nine days, so I can sort of compare this to my estimated hit rate.

So, over the past nine days, we have had:

  • 106,837 accepted MAIL FROMs and 106,835 accepted RCPT TOs, which means that almost all of our accepted messages have been delivered to a single destination address.

  • 29,194 accepted RCPT TOs for IPs listed in one of the Spamhaus DNSBLs. Since these were accepted, these are recipients who have not opted into any amount of our server-side spam filtering.
  • 7,685 accepted RCPT TOs for domains listed in the DBL. A quick check suggests that about 6,390 of these came from IP addresses that were in the Spamhaus DNSBLs.

  • 13,020 RCPT TOs that were rejected because the sender IP was in one of the Spamhaus DNSBLs. This is checked before the DBL.
  • Only 346 RCPT TOs that were rejected because the sender domain was in the DBL.

On the one hand, this doesn't look too great for the DBL; despite my initial estimate, we aren't getting many rejections from checking the DBL. On the other hand, when I look at the source addresses of those rejections, something jumps out right away: just over half of them come from one system.

Specifically, over half of them come from the mail server for another (sub)domain on campus, one where a number of our users have accounts and forward (all of) their email from that system to us. What we've effectively done with the DBL is to add an additional SMTP-time defense to reject forwarded spam. In fact there are a number of 'forwarded from another campus mail system' DBL rejections in the past nine days from other sources.

My personal view is that these rejections are valuable ones (partly because I've observed our commercial anti-spam system not doing so well with forwarded spam in the past). So on the whole I'm happy with what the DBL is doing here, and also happy that now I have better numbers on what it could be doing if more people opted in to server-side spam filtering.

(Despite my bright words here, I'm also disappointed that adding the DBL isn't rejecting more messages. I guess this is partly down to how a lot of spam with DBL domains comes from IPs that are already blocked on their own. Note that we're using the DBL in its most basic and limited mode, where we check it against the MAIL FROM domain; you're really supposed to use it to check domains mentioned in the body of email messages.)

SpamhausDBLHitRate2016-05 written at 00:59:48; Add Comment

2016-04-29

You should plan for your anti-spam scanner malfunctioning someday

Yesterday I mentioned that the commercial anti-spam and anti-virus system we use ran into a bug where it hung up on some incoming emails. One reaction to this is to point and laugh; silly us for using a commercial anti-spam system, we probably got what we deserved here. I think that this attitude is a mistake.

The reality is that all modern anti-spam and anti-virus systems are going to have bugs. It's basically inherent in the nature of the beast. These systems are trying to do a bunch of relatively sophisticated analysis on relatively complicated binary formats, like ZIP files, PDFs, and various sorts of executables; it would be truly surprising if all of the code involved in doing this was completely bug free, and every so often the bugs are going to have sufficiently bad consequences to cause explosions.

(It doesn't even need to be a bug as such. For example, many regular expression engines have pathological behavior when exposed to a combination of certain inputs and certain regular expressions. This is not a code bug since the RE engine is working as designed, but the consequences are similar.)

What this means is that you probably want to think ahead about what you'll do if your scanner system starts malfunctioning at the level of either hanging or crashing when it processes a particular email message. The first step is to think about what might happen with your overall system and what it would look like to your monitoring. What are danger signs that mean something isn't going right in your mail scanning?

Once you've considered the symptoms, you can think about pre-building some mail system features to let you deal with the problem. Two obvious things to consider are documented ways of completely disabling your mail scanner and forcing specific problem messages to bypass the mail scanner. Having somewhat gone through this exercise myself (more than once by now), I can assure you that developing mailer configuration changes on the fly as your mail system is locking up is what they call 'not entirely fun'. It's much better to have this sort of stuff ready to go in advance even if you never turn out to need it.

(Building stuff on the fly to solve your urgent problem can be exciting even as it's nerve-wracking, but heroism is not the right answer.)

At this point you may also want to think about policy issues. If the mail scanner is breaking, do you have permission to get much more aggressive with things like IP blocks in order to prevent dangerous messages from getting in, or is broadly accepting email important enough to your organization to live with the added risks of less or no mail scanning? There's no single right answer here and maybe the final decisions will only be made on the spot, but you and your organization can at least start to consider this now.

PlanForSpamScannerMalfunction written at 00:21:06; Add Comment

(Previous 10 or go back to April 2016 at 2016/04/28)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.