Errors during SMTP conversations aren't trustworthy, illustrated
Recently we had a mail problem where we could not deliver email to a particular remote destination for a while. A major Australian ISP spent six days telling us:
421 4.7.25 Temporarily rejected. Reverse DNS for <our-IP> failed. IB108
(Based on Exim log messages, this happened during the initial SMTP connection, before we even EHLO'd.)
Then later the ISP was fine again, sadly after the person trying to send mail had their attempts time out and contacted us to see if we could do anything about it. The ISP was fine before this incident, and they've been fine ever since, and no other destination reported anything like this message to us.
We did not have malfunctioning nameservers or missing reverse DNS for six days. We did not, as far as we can tell, have DNS servers that the outside world had problems reaching for six days. I suppose it's possible that this large ISP had some internal problem that prevented their DNS servers from talking to our DNS servers for six days, but not so big that they noticed it and dealt with it right away. Alternately, perhaps this ISP was not being honest with us about why they decided not to accept connections from our outgoing email server. We can't tell.
(During the six day problem period, our user was able to reach their recipient on this ISP from some other places, both of which are big email heavyweights, so it was not an issue with the recipient or with the ISP's mail system in general.)
It's not really news or a new thing that the messages you get from other people's mail servers are not necessarily telling you the real reason that your messages aren't being accepted. Many of the major mail providers seem to do it; it's been a long time since I really believed GMail's SMTP time messages, for example. We have many cases where GMail will give temporary 4xx SMTP error codes for an email for a while with various claims in the SMTP error messages, then wind up accepting it. In other cases the 'temporary' 4xx error codes stick for as long as we want to keep retrying and we eventually time out the message.
(My personal lesson learned from this incident was that I should pay more attention to our queued email, then look into things that seemed odd. At the very least I might have been able to reproduce this outside of Exim, and test it from other IPs on the same subnet and elsewhere within the university.)
There are limitations to what expendable addresses can help with
I'm a long time advocate of using expendable addresses for as many things as possible (and then making sure you can turn them off). However, yesterday's incident of junk email as a cover for worse also shows some of the limitations of using expendable addresses, because they wouldn't really have avoided this situation.
The first way they wouldn't have avoided the situation (of having a flood of junk email sent to someone to distract them) is that generally expendable addresses in all of their forms still funnel into your actual mailbox. Some people sort some expendable addresses into low-priority places, but you're unlikely to do this with the email address you use for things like notifications from your financial institutions. You usually want to see those right away, not have them hidden away.
The second way they wouldn't have avoided the situation is that if someone wants to unleash a flood of email onto you to distract you, it doesn't necessarily matter what exact email address they get their hands on. All they need is some email address that goes into some mailbox that you look at regularly. It would be better to get the actual email address you use with your financial institution, but for drowning a bit of signal in a lot of noise, often many email addresses will do about as well. It doesn't even have to go to the right mailbox, just one that will cause you to drown in the volume.
(Certainly this would be the case for me. I would have an easier time of sorting things later and perhaps not missing signal amidst noise with my extensive collection of expendable addresses, but in the heat of the moment, if you clog up my inbox it doesn't really matter how.)
The one part of this sort of flood that expendable addresses will help with is the longer term aftermath. One of the iron rules of email addresses is that once some people have their hands on some email address, they will never stop emailing it. After a flood, obviously a lot of people have some email address of yours and a certain percentage of them will keep emailing that address forever. If the address they have is an expendable address that you can turn off, you can at least make them go away.
Junk email as a cover for more nefarious things
This morning, we got a call (through a Point of Contact) that one of the people here was being absolutely flooded by incoming spam and junk email. It was a real flood, too; in total they received over 1,200 email messages that made it past our anti-spam defenses, most of them over about an hour and a half (I'll let you do the math on the messages per minute rate, and then think about trying to do anything about it in a mail client). This person would up having to basically turn off receiving external email.
Unfortunately, this wasn't the only thing going on in that person's life this morning, because they also discovered an unauthorized financial transaction (I don't know if they found it before or after the flood stared, but I suspect before). The obvious theory is that this sudden, exceptional flood of junk email is not at all a coincidence, and was instead intended to cover up a transaction notification from the financial institution involved. To abuse a phrase, if you can't stop a tree from falling, perhaps you can obscure it by clear-cutting the entire forest around it.
We rejected some of the incoming email at SMTP DATA time, which causes Exim to log some message headers. Based on these rejections and also various of the sending addresses, some of the incoming email appears to have been 'congratulations on signing up for our mailing list', 'thank you for contacting us', and so on email that could be deliberately induced by a third party who wanted to flood someone's mailbox. Other messages seem to have been genuine spam, or very likely genuine spam.
(I am sure you will be shocked to hear that Sendgrid features high up in the list of sending sources, and also the list of sources blocked because of SBL listings.)
One of the unnerving things about this incident is that the attacker clearly was highly prepared. They had at least a thousand (or more) potential sources of junk and spam email identified and lined up, ready to trigger. And it's pretty clear that the triggering was automated. Since the sources of the junk email come from all over, it seems likely that the attacker wasn't exploiting a single piece of (web) software to stuff in addresses. They probably had an entire suite of attacks against various different 'contact us' and 'subscribe me' and so on forms ready to go.
(I have no theories for how the attacker got spammers to start emailing this address so fast. Maybe there is a market for 'hot email addresses, mail them now while they last' where the purchased addresses get used basically immediately.)
Real email has MIME attachments that are HTML
One of the things that MIME parts in email have (or can have) is a content disposition, which theoretically tells your mail client whether the MIME part should be displayed as part of the message (a content disposition of inline) or it should be not displayed by the client and you'd be offered the option to save it, view it with something, and so on (a content disposition of attachment).
(HTTP reuses this idea in the Content-Disposition header, which tells the browser if it should try to display the response or jump straight to forcing you to download it or hand it to some external program.)
In most email, HTML MIME parts have an inline content disposition, because this is how the sender (or their mail software) arranges for them to be visible to the receiver. This is true both for a message that is HTML only or for a 'multipart/alternative' message with (theoretically) equivalent plain text and HTML versions.
For a long time, I've known that our commercial anti-spam filter was counting some varieties of phish spam as 'viruses'. When we first started logging MIME part type information, I discovered that a lot of these rejections for for HTML MIME parts that had an 'attachment' content disposition. This led me to assume that essentially all legitimate real mail with HTML MIME parts had them with an inline content disposition, and only suspicious and probably bad email had 'attachment' HTML MIME parts.
Recently I had reasons to specifically look at our MIME part type logs for email that we can be reasonably confident is good, and I got a surprise. We definitely see legitimate email with HTML MIME parts that have a content disposition of 'attachment'. Apparently this is even the standard and normal behavior of some email clients in some situations, especially when forwarding email.
Beyond the specific fixing of my ignorance and assumption here, in general this has been a useful reminder to me that I don't actually know as much about modern email as I usually think I do. Before I confidently assume something like 'HTML MIME parts that are attachments are suspicious', I should at least go check our logs to see what they say. After all, that's the largest reason we collect this information; we realized that we didn't actually know what sorts of MIME parts our users received and we should.
In modern email, it's easy for plaintext and HTML parts to drift apart
I recently read When The Text And Html Disagree (via, itself via), which is about an instance where an email message had an important disagreement between the plaintext part and the HTML part. In this case it was fortunately obvious that something was wrong, but I'm sure there have been less obvious instances.
I believe that one reason this drift happens comes down to that old aphorism that if you don't test it, it's broken. For email with alternate parts, the revised aphorism can be said as “if you don't see it, it's broken”. Modern email clients normally show you the HTML part to start with, and then most make a generally rational decision to make it at least hard to see the plaintext one. So when people look at test versions (or real versions) of such email messages, only the HTML part has to look good in order for the whole thing to seem fine. The unseen text part can quietly rot away, noticed only by unusual people like me who look at the plaintext version.
(You would think that mass email authoring environments would raise an alert if you only edit the HTML portion of a standing mixed-part email, but apparently not.)
I've seen this sort of thing for spam, but When The Text And Html Disagree makes a nice illustration that it's not just spam that suffers from the issue. In the end we probably shouldn't be too surprised about any of this, because keeping multiple things in synchronization is pretty much a hard problem all over. If you want it to work reliably you need to automate it, and automating this sort of update isn't easy.
(Keeping things in sync by hand is extra work, and sooner or later extra work doesn't get done or doesn't get done right. People forget, people make mistakes, people will get to it tomorrow because there's an urgent thing right now, and so on and so forth.)
PS: Given this, the most likely answer to the question in When The Text And Html Disagree is that if there's a disagreement and it's not clear, the HTML part is right and the plaintext one is wrong. It could be that you have a rare email where someone has updated the plaintext part but not the HTML part, but the odds are very good that it's the other way around. The exception to this is if you're in a very unusual environment where most people see the plaintext part instead of the HTML part.
Mailing lists and bounce handling (or not handling bounces) today
Traditionally, if you ran a proper mailing list you were supposed to remove addresses if they started bouncing (or being rejected). The need to automate this was the major reason behind email tricks like Variable envelope return path (VERP), where email to every separate address is sent with a unique envelope sender (and so has to be sent in a separate transaction, instead of letting you batch together multiple 'RCPT TO' addresses at a single destination). There's always been a little bit of an open question about how rapidly you should remove failing addresses, because things happen, but the traditional view (one I've echoed myself) is that you shouldn't let bouncing addresses linger for long.
Then we started having incidents like GMail's major failure yesterday:
It appears that a whole bunch of people are about to discover the downside of various GMail failure modes, and also a bunch of other people are going to stop treating SMTP 5xx bounces as reasons to remove people from mailing lists and so on.
(Since ~4pm Eastern we're seeing GMail '550 no such account' on well established GMail addresses for some but not all deliveries.)
If GMail is going to randomly reject well established email addresses for a third of a day or so every so often, why should you be in any rush to remove failing GMail addresses from your mailing lists? This is even (and especially) the case if the bounces claim that the address doesn't exist; as we've just seen, this rejection reason itself is unreliable from one of the biggest email providers on the planet.
Today, the only thing an email bounce means is 'I'm not accepting this particular piece of email from you right now'. Probably the receiving email system will continue to reject that particular email if it's resubmitted in the future, but not always. There is certainly no strong reason to believe that different email in the future won't be accepted.
If bounces are mysterious, random, and not necessarily predictive of future results, it's pretty reasonable to not use them for anything. It would be nice if people attempt to draw inferences from patterns in bounces (such as 'this email address has never accepted any of our email'), but that takes much more work, tracking of information, and tuning than pretty much ignoring bounces.
(Of course if places like GMail start scoring your email badly if you keep repeatedly sending email that they bounce, then people will go to the work. For GMail. Not necessarily for anyone else.)
I'm not really fond of this result. I would like bounces to be a reliable sign that addresses should be removed from lists, and I would like mailing lists (and databases of email addresses) to remove addresses that bounce. But I can't say that bounces are such a clear reliable sign any more.
The death and life of postmaster@anywhere
A decade ago I wrote The quiet death of postmaster@anywhere, where I proclaimed that the postmaster@ address was dead, killed by steadily increasing spam volume (and decreasing actual use). I was recently reminded of that entry and, a decade later, I have to sort of take back what I wrote. It's true that postmaster addresses are increasingly dead, but when they are it's not for the reasons that I thought back then.
Perhaps ten years ago our postmaster account got a bunch of spam and bounces (I can no longer remember, but what I wrote in my old entry suggests that it may have at the time). These days it doesn't; we get basically no bounces that I can see, and very little spam (and almost all of the spam is recognized and rejected immediately). It's possible that this experience is far from universal, but I can think of some reasons why it might be. And in more anecdotal data, my spam-trap SMTP server has seen only one email to its postmaster address from 2017 onward.
(One potential reason for less spam to postmaster@ addresses is an increased difficulty in sending spam and along with it, an increased professionalization of spam sending. Spamming postmaster addresses is not very productive, so if spamming resources are limited and cost spammers things I'd expect to see less of it.)
But that doesn't mean that sending email to postmaster@ addresses will do you any good these days, or even work. What I've observed from sending the occasional spam complaint is that more and more places seem to be no longer accepting email to postmaster@ and sometimes even abuse@ addresses. I doubt this is driven by spam volume; instead I rather think it's a deliberate policy choice by places, especially by large email providers. Even when email to postmaster@ 'works' for large places, it's probably not reaching the technical people who deal with the mail system, or even necessarily the abuse handling people. It's probably much more likely to drop into some sort of general support system, with the attendant low odds of getting prompt or effective attention.
Of course in one sense this is nothing new. It's been years since sending email to any well known or public address got you useful results with large places (for either spam complaints or technical problem reports). The only real difference these days is you might get an actual SMTP rejection that says your email to postmaster@ or abuse@ is not accepted, instead of your email just quietly disappearing.
But for many smaller and more modest places, I suspect that postmaster@ is still alive. It certainly is here, and even for the overall university email system.
If you send automated email, you should scan it with anti-spam software
Partly in light of Microsoft SharePoint's problem with spam, here is an obvious thing:
If you send automated email to the outside world, you should always run it through some anti-spam system and raise big alarms if they ever trigger.
(If I was doing this today I would use ClamAV and rspamd because they're free and what I'm used to, but anything will do.)
You should do this even if you aren't allowing external people to initiate the email and put content in it (which you shouldn't, because allowing that leads to spam).
You might say that your automated system can never be exploited to send spam. That may or may not be true, but even if your automated email genuinely has no spam, having ClamAV, rspamd, or whatever dislike it is a very bad sign that you should pay attention to, because it likely means that a lot of people will not be receiving the email. And beyond that, checking your automated email is an important and generally easily done insurance policy.
The gold standard for doing this check is to have an external email server in a separate domain with an address (or several addresses) that you send the automated email to on a regular basis, that runs these anti-spam tools. That gives you the closest you can get to how anti-spam tools on other people's systems will perceive your email, complete with any effects from your sending IP and so on. Running scanners on your outgoing email before it leaves your system doesn't quite capture everything, but it will generally cover scanning the content for bad things and it lets you react faster and earlier (for example, by completely stopping the automated email if it triggers anti-spam systems). It may also be easier to implement.
(There are multiple reasons to not put visible spam results on outgoing email in general, and there can be political ones to not block email written by your local people if it triggers your outgoing anti-spam checks. But I feel that automated email is different; it's much less risky to block it, presuming that you immediately alarm on this and get people's attention.)
Microsoft SharePoint is being used to send spam
I'm paying more attention to what our mail system detects as spam and where it's coming from than usual, so I'm getting to notice things (or, in the alternate phrasing, being forced to notice things). Today's thing that I noticed is that to no one's surprise, Microsoft SharePoint is currently being used as a spam sending vector. I say 'to no one's surprise' because it's a long standing rule that anything that can be used to send email to random people with any user supplied content will be exploited for spam (eg).
The email we see is genuine SharePoint email sent from Microsoft, DKIM signed by both sharepointonline.com and 'spoapaceop.onmicrosoft.com', with the envelope address of firstname.lastname@example.org and sent to us by outlook.com machines. Typical headers look like:
From: Katholina Keth <email@example.com>
Subject: Katholina Keth shared "❤Unsatisfied women Need a guy ❤" with you.
The header samples I've seen have a long list of To: addresses at all sorts of places (not just our university subdomain or even the university as a whole). Some messages have a Reply-To: pointing to various addresses at a legitimate domain, which may be a signal that the spammer has compromised a part of that organization so that they can either hijack accounts there or register their own, then use them to register in SharePoint.
(I only have access to some message headers, so I can't tell what is in the body of the email. Hopefully Microsoft doesn't allow SharePoint emails to include substantial amounts of user-supplied content, so all people get is a link to where the spam is.)
At one level all of this is unsurprising. As a product feature, it's attractive to let SharePoint users share their files and other SharePoint materials with people who haven't already signed up with SharePoint, and when you do that of course SharePoint has to tell the target something about what is being shared. The title is an obvious thing to include, and you have to let users change the title of their documents. But now Microsoft has given spammers the ability to send some amount of relatively arbitrary text to relatively arbitrary email addresses.
(I would like to say 'and now Microsoft has a problem', but of course they don't. Very few people are in a position where they can block SharePoint email over this.)
Using DMARC information is complicated in practice in the real world
As part of a planned switch to rspamd as our anti-spam system (well, our spam recognition system), I've been taking a closer look at how our test rspamd scores some email and what it reports about why. This has given me a new and unhappy view of DMARC in the real world, building on how DKIM looks for our 'good' email. So let me tell you a story, starting with the background.
The university is now a big user of Microsoft Teams. The university's UTmail+ institutional email system for staff, faculty, and so on is also "powered by Office 365", which is to say that it is hosted in Microsoft's 'Outlook/Office 365' cloud (with a bunch of contractual terms around what datacenters our data actually lives in and so on). However, while everyone here has a UTmail+ account (which is also your Teams account), some people forward their email to other systems. In particular, any number of people here forward their UTmail+ email to us.
Various activity in Microsoft Teams generates email to you, which normally is sent to your institutional email account in UTmail+ and may then get forwarded to us. As a good email citizen in the modern world, Microsoft Teams DKIM signs its email:
DKIM: d=email.teams.microsoft.com s=selector1 c=relaxed/relaxed a=rsa-sha256
Such email comes from 'firstname.lastname@example.org' as both the envelope sender and the From: header address.
As a good email citizen in the modern world, the Outlook/Office 365 environment also DKIM signs outgoing email, this time under the 'onmicrosoft.com' domain instead of microsoft.com:
DKIM: d=utoronto.onmicrosoft.com s=[...] c=relaxed/relaxed a=rsa-sha256
Unfortunately, when Microsoft Teams email transits either our hosted Office 365 environment or at least one other institution's hosted one, something breaks the Microsoft Teams DKIM signature. At the same time, Office 365 does not change the From:, which means that the message is covered by whatever DMARC policy applies to that 'email@example.com' address.
The email.teams.microsoft.com sub-subdomain does not have its own DMARC policy. Instead it falls under the general microsoft.com DMARC policy:
v=DMARC1; p=reject; pct=100; [...]
This says that anything with a From: of a microsoft.com address that fails DMARC and SPF should be rejected. Since Microsoft advertises this DMARC policy, rspamd takes them at their word and applies a spam score penalty to all of this entirely legitimate email.
There is a standard that is supposed to deal with this problem, ARC. The Office 365 forwarding environment appears to put on various ARC headers, but in a sample Teams email I have these headers seem to claim not to have validated the original DKIM signature, just the SPF results. In any case, it feels like there would need to be some explicit configuration somewhere so that either we or Microsoft would give ARC signatures from these Office 365 environments the power to give a pass on microsoft.com's DMARC policy. In practice rspamd sometimes applies a mild bonus (ie, not-spam) for ARC_ALLOW, and sometimes gives it a neutral result of 'no change in score' (a 0.00 result, at least according to the logs).
(It appears that rspamd 2.5 adds an undocumented option to its ARC module to configure a whitelist of trusted ARC signers, per here, here and here, but without documentation I'm wary of touching it even if I could figure out what exactly to put where.)
Beyond the specific problem here and the opacity of what is going on in evaluating DKIM, DMARC, and ARC results in rspamd, notice how we have wound up in a situation where none of these things can be checked in isolation. In practice, you cannot just look at the From: domain and see whether it passes DMARC and DKIM checks; you need additional validation that is conditional on, well, something. In practice, to reliably accept Microsoft Teams email that transits a hosted Office 365 environment we would probably have to identify all of the outgoing sources for this, including for people who forward their Microsoft Teams email through another place that hosts their email in Office 365.
The practical result of all of this is that today I reconfigured rspamd to not assign any spam score penalty for a DMARC failure. DMARC failures are clearly going to happen with legitimate email to our users that transits through their UTmail+ addresses (at least), which in practice makes them useless.