Wandering Thoughts archives

2021-12-24

Sadly, my experience is that big commercial anti-malware detection is better

For reasons beyond the scope of this entry, for the past couple of years I've been running a large commercial anti-spam system (and its malware recognition) side by side with what we could put together with ClamAV and some low-cost commercial ClamAV signature sources. More or less from the beginning it's been clear to me that our commercial system was recognizing malware that ClamAV was not. Some of this was new things that we could add to our manual recognition and rejection, but at this point another significant source of missed ClamAV recognition is (still) malware in Microsoft Office files.

This is not really a result that I was hoping for. Our commercial anti-spam system has been on vendor life support for more than a year, so its recognition engine definitely isn't being updated for new capabilities and who knows how much its signature database is being updated. Despite that, it's still ahead of a well regarded open source malware detection system.

Some amount of bad email makes it through both ClamAV and our commercial anti-spam system and is then forwarded on to elsewhere by some of our users. These days, that elsewhere includes both Office365 and GMail. Trawling our logs suggests that both of these recognize and reject even more malware than we do, although this effect is somewhat entangled in them also recognizing more spam than we do.

This is not really surprising. Large providers of email and of anti-spam services have more resources for both improving their scanning engines and coming up with signatures and danger signs. They see more email (one way or another) and can build more sophisticated systems to analyze it in various ways. Greater volume with automated analysis and feedback systems can mean faster responses to new malware. It's not really surprising that the open source and small commercial firms can't match this.

(One suggestive thing is that our commercial anti-spam software provider is not getting out of the anti-spam business. Instead, it's moving to having only a cloud filtering option, where you run your incoming email through their cloud systems. This gives them far more aggregate visibility into potential malware and makes responding to it much faster. I suspect that they were pushed to this partly to match the malware filtering quality of the big providers like Google and Microsoft.)

PS: For Microsoft Office files specifically, it might be possible for us to build something using oletools, and we may have to try to, just to not let too much bad stuff through once we can no longer use the commercial anti-spam software.

(This is one unhappy aspect of how running your own email is increasingly an artisanal choice. It's possible that a lot of manual tuning and adjustment and software will get us to something close to the quality of big commercial providers, but it's unlikely to be easy.)

CommercialAntiMalwareBetter written at 22:58:40; Add Comment

2021-10-27

We're seeing increasingly targeted and dangerous phish spam attempts

In the old days, phish spam was generally pretty crude and generally easily recognized. A lot of it still is, but we're increasingly seeing some pretty sophisticated and targeted phish spam. Some of the latest phish spam we've seen uses essentially exact duplicates of university web pages and authentication dialogs, and has relatively convincing pitches in the email to get people to click on the links. To me, this is scary and goes well beyond assuming we can be phished, as I did in 2019. In 2019, I thought that an alert person might still have a reasonable chance. Now, I think that all that's between us and a significant scale compromise is that attackers aren't that committed yet (and whatever multi-factor authentication has propagated to our user population).

The university has it somewhat worse than companies do, in that our "internal" information really isn't. Since we have a large and and varied user population and almost all of our internal services websites are public, there's very little information on how the university sends out email notifications about things and what our internal websites look like that couldn't be found by a dedicated attacker. With that information in hand, the attacker could put together a basically letter-perfect fake.

(There are some technical measures the university has adopted to try to make such fake emails more obvious, but the only real mitigation is multi-factor authentication, which itself has assorted limitations.)

In light of all of this, one of the things I wonder is how long people will continue using email to deliver high-sensitivity information. One thing that has to be attractive to the university is moving to delivering all notifications about things like payroll, benefits, vacation planning, and so on (basically anything that would actively prompt people to log in) over a communication method that simply doesn't allow outsiders to send messages in.

(This is especially the case because the university already has access to such a communication method and is encouraging staff and faculty to adopt it for general use. I'm not naming the services involved because it's provided by a large commercial organization that doesn't need free publicity.)

PhishGettingDangerous written at 23:41:39; Add Comment

2021-08-07

Some thoughts on new top-level domains being used for spam

Over on Twitter, I had a little exchange:

@thatcks: Another day, another new vanity TLD that I'm never accepting email from (because of spam, of course; the dominant use of vanity TLDs in email senders is for spam).

@MrDOS: This is a self-fulfilling prophecy, though: by denying legitimate mail from these TLDs, you're guaranteeing that no one will ever be able use these TLDs for legitimate mail.

@thatcks: When the spammers get there first, the well is poisoned. Un-poisoning the well is not my (or anyone's) problem; we just not want to be fed poisoned water.

On the one hand, I think that my reaction and final tweet are not wrong. Potential receivers of email are under no obligation to help senders get it delivered, and if something only or mostly sends you spam, well, you can sensibly block it and many people will. As a result, spammers can and do poison certain things, including new top level domains (mostly generic TLDs, but sometimes country ones as well).

(Although I can't find a link to it, I believe I once saw a summary of a study on how many new gTLD domains were canceled or removed almost immediately after creation. For many active gTLDs, a surprisingly large number of new domains went away very rapidly. The study didn't conclusively say they were used primarily for spam and other bad purposes, but that was the obvious speculation.)

On the other hand, this feels uncomfortable close to pushing email further toward a closed system in practice, where only large existing senders of email can get their email accepted and other people are frozen out. Setting up a broad based block of any sort (whether a gTLD or a large network (IP) area) makes it incrementally harder for people to send email from new, not well established hosts, and anecdotally that's already hard.

On the third hard, my personal email box is a much different thing than a large mail provider. Decisions made by Google, Microsoft, and so on about who they will accept email from (and what they will require from that email) have far bigger effects than my decisions do. It also feels like the central decisions of Google and so on are fundamentally different (and more dangerous) than the aggregated distributed decisions of a large number of people, even if they come to roughly the same end result.

I don't have any firm answers, especially universal ones, but I'm not likely to change my own personal blocks. Sorry, gTLDs and people using them, but not really. In the end I care more about my mailbox than anything else, because I've just become too tired of the state of modern email.

(I have mixed views on new TLDs in general, but that's somewhat separate from their use in email.)

NewTLDsAndSpamForMe written at 23:31:31; Add Comment

2021-05-10

Errors during SMTP conversations aren't trustworthy, illustrated

Recently we had a mail problem where we could not deliver email to a particular remote destination for a while. A major Australian ISP spent six days telling us:

421 4.7.25 Temporarily rejected. Reverse DNS for <our-IP> failed. IB108

(Based on Exim log messages, this happened during the initial SMTP connection, before we even EHLO'd.)

Then later the ISP was fine again, sadly after the person trying to send mail had their attempts time out and contacted us to see if we could do anything about it. The ISP was fine before this incident, and they've been fine ever since, and no other destination reported anything like this message to us.

We did not have malfunctioning nameservers or missing reverse DNS for six days. We did not, as far as we can tell, have DNS servers that the outside world had problems reaching for six days. I suppose it's possible that this large ISP had some internal problem that prevented their DNS servers from talking to our DNS servers for six days, but not so big that they noticed it and dealt with it right away. Alternately, perhaps this ISP was not being honest with us about why they decided not to accept connections from our outgoing email server. We can't tell.

(During the six day problem period, our user was able to reach their recipient on this ISP from some other places, both of which are big email heavyweights, so it was not an issue with the recipient or with the ISP's mail system in general.)

It's not really news or a new thing that the messages you get from other people's mail servers are not necessarily telling you the real reason that your messages aren't being accepted. Many of the major mail providers seem to do it; it's been a long time since I really believed GMail's SMTP time messages, for example. We have many cases where GMail will give temporary 4xx SMTP error codes for an email for a while with various claims in the SMTP error messages, then wind up accepting it. In other cases the 'temporary' 4xx error codes stick for as long as we want to keep retrying and we eventually time out the message.

(My personal lesson learned from this incident was that I should pay more attention to our queued email, then look into things that seemed odd. At the very least I might have been able to reproduce this outside of Exim, and test it from other IPs on the same subnet and elsewhere within the university.)

SMTPErrorsNotTrustworthy written at 22:49:09; Add Comment

2021-02-04

There are limitations to what expendable addresses can help with

I'm a long time advocate of using expendable addresses for as many things as possible (and then making sure you can turn them off). However, yesterday's incident of junk email as a cover for worse also shows some of the limitations of using expendable addresses, because they wouldn't really have avoided this situation.

The first way they wouldn't have avoided the situation (of having a flood of junk email sent to someone to distract them) is that generally expendable addresses in all of their forms still funnel into your actual mailbox. Some people sort some expendable addresses into low-priority places, but you're unlikely to do this with the email address you use for things like notifications from your financial institutions. You usually want to see those right away, not have them hidden away.

The second way they wouldn't have avoided the situation is that if someone wants to unleash a flood of email onto you to distract you, it doesn't necessarily matter what exact email address they get their hands on. All they need is some email address that goes into some mailbox that you look at regularly. It would be better to get the actual email address you use with your financial institution, but for drowning a bit of signal in a lot of noise, often many email addresses will do about as well. It doesn't even have to go to the right mailbox, just one that will cause you to drown in the volume.

(Certainly this would be the case for me. I would have an easier time of sorting things later and perhaps not missing signal amidst noise with my extensive collection of expendable addresses, but in the heat of the moment, if you clog up my inbox it doesn't really matter how.)

The one part of this sort of flood that expendable addresses will help with is the longer term aftermath. One of the iron rules of email addresses is that once some people have their hands on some email address, they will never stop emailing it. After a flood, obviously a lot of people have some email address of yours and a certain percentage of them will keep emailing that address forever. If the address they have is an expendable address that you can turn off, you can at least make them go away.

ExpendableAddressLimitations written at 23:32:03; Add Comment

2021-02-03

Junk email as a cover for more nefarious things

This morning, we got a call (through a Point of Contact) that one of the people here was being absolutely flooded by incoming spam and junk email. It was a real flood, too; in total they received over 1,200 email messages that made it past our anti-spam defenses, most of them over about an hour and a half (I'll let you do the math on the messages per minute rate, and then think about trying to do anything about it in a mail client). This person would up having to basically turn off receiving external email.

Unfortunately, this wasn't the only thing going on in that person's life this morning, because they also discovered an unauthorized financial transaction (I don't know if they found it before or after the flood stared, but I suspect before). The obvious theory is that this sudden, exceptional flood of junk email is not at all a coincidence, and was instead intended to cover up a transaction notification from the financial institution involved. To abuse a phrase, if you can't stop a tree from falling, perhaps you can obscure it by clear-cutting the entire forest around it.

We rejected some of the incoming email at SMTP DATA time, which causes Exim to log some message headers. Based on these rejections and also various of the sending addresses, some of the incoming email appears to have been 'congratulations on signing up for our mailing list', 'thank you for contacting us', and so on email that could be deliberately induced by a third party who wanted to flood someone's mailbox. Other messages seem to have been genuine spam, or very likely genuine spam.

(I am sure you will be shocked to hear that Sendgrid features high up in the list of sending sources, and also the list of sources blocked because of SBL listings.)

One of the unnerving things about this incident is that the attacker clearly was highly prepared. They had at least a thousand (or more) potential sources of junk and spam email identified and lined up, ready to trigger. And it's pretty clear that the triggering was automated. Since the sources of the junk email come from all over, it seems likely that the attacker wasn't exploiting a single piece of (web) software to stuff in addresses. They probably had an entire suite of attacks against various different 'contact us' and 'subscribe me' and so on forms ready to go.

(I have no theories for how the attacker got spammers to start emailing this address so fast. Maybe there is a market for 'hot email addresses, mail them now while they last' where the purchased addresses get used basically immediately.)

JunkEmailFloodAsCover written at 22:32:03; Add Comment

2021-01-21

Real email has MIME attachments that are HTML

One of the things that MIME parts in email have (or can have) is a content disposition, which theoretically tells your mail client whether the MIME part should be displayed as part of the message (a content disposition of inline) or it should be not displayed by the client and you'd be offered the option to save it, view it with something, and so on (a content disposition of attachment).

(HTTP reuses this idea in the Content-Disposition header, which tells the browser if it should try to display the response or jump straight to forcing you to download it or hand it to some external program.)

In most email, HTML MIME parts have an inline content disposition, because this is how the sender (or their mail software) arranges for them to be visible to the receiver. This is true both for a message that is HTML only or for a 'multipart/alternative' message with (theoretically) equivalent plain text and HTML versions.

For a long time, I've known that our commercial anti-spam filter was counting some varieties of phish spam as 'viruses'. When we first started logging MIME part type information, I discovered that a lot of these rejections for for HTML MIME parts that had an 'attachment' content disposition. This led me to assume that essentially all legitimate real mail with HTML MIME parts had them with an inline content disposition, and only suspicious and probably bad email had 'attachment' HTML MIME parts.

Recently I had reasons to specifically look at our MIME part type logs for email that we can be reasonably confident is good, and I got a surprise. We definitely see legitimate email with HTML MIME parts that have a content disposition of 'attachment'. Apparently this is even the standard and normal behavior of some email clients in some situations, especially when forwarding email.

Beyond the specific fixing of my ignorance and assumption here, in general this has been a useful reminder to me that I don't actually know as much about modern email as I usually think I do. Before I confidently assume something like 'HTML MIME parts that are attachments are suspicious', I should at least go check our logs to see what they say. After all, that's the largest reason we collect this information; we realized that we didn't actually know what sorts of MIME parts our users received and we should.

HTMLAttachmentsAreReal written at 00:33:01; Add Comment

2021-01-06

In modern email, it's easy for plaintext and HTML parts to drift apart

I recently read When The Text And Html Disagree (via, itself via), which is about an instance where an email message had an important disagreement between the plaintext part and the HTML part. In this case it was fortunately obvious that something was wrong, but I'm sure there have been less obvious instances.

I believe that one reason this drift happens comes down to that old aphorism that if you don't test it, it's broken. For email with alternate parts, the revised aphorism can be said as “if you don't see it, it's broken”. Modern email clients normally show you the HTML part to start with, and then most make a generally rational decision to make it at least hard to see the plaintext one. So when people look at test versions (or real versions) of such email messages, only the HTML part has to look good in order for the whole thing to seem fine. The unseen text part can quietly rot away, noticed only by unusual people like me who look at the plaintext version.

(You would think that mass email authoring environments would raise an alert if you only edit the HTML portion of a standing mixed-part email, but apparently not.)

I've seen this sort of thing for spam, but When The Text And Html Disagree makes a nice illustration that it's not just spam that suffers from the issue. In the end we probably shouldn't be too surprised about any of this, because keeping multiple things in synchronization is pretty much a hard problem all over. If you want it to work reliably you need to automate it, and automating this sort of update isn't easy.

(Keeping things in sync by hand is extra work, and sooner or later extra work doesn't get done or doesn't get done right. People forget, people make mistakes, people will get to it tomorrow because there's an urgent thing right now, and so on and so forth.)

PS: Given this, the most likely answer to the question in When The Text And Html Disagree is that if there's a disagreement and it's not clear, the HTML part is right and the plaintext one is wrong. It could be that you have a rare email where someone has updated the plaintext part but not the HTML part, but the odds are very good that it's the other way around. The exception to this is if you're in a very unusual environment where most people see the plaintext part instead of the HTML part.

PlaintextAndHTMLDriftApart written at 22:34:37; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.