What email messages to not send autoreplies to (late 2018 edition)

November 7, 2018

Our mail system is very old. Much of the current implementation dates back about ten years, when we moved it to be based on Exim, but the features and in some cases the programs involved go back much further than that. One part of it is that we have a local version of the venerable Unix vacation program, and this local version goes back a very long time (some comments say it is the 4.3 BSD-Reno version, which would date it to 1990). By now our version is ancient and creaky, and in general we're no longer enthused about maintaining locally hacked versions of software, so we need to move to using the standard Ubuntu version. Unfortunately, our local version has some differences from the standard one; it supports an additional command line option that's used by an unknown number of people, and we long since made it not autoreply to some additional things over what the standard vacation already ignored. To deal with both problems we're using the standard computer science solution of adding another layer of indirection, in the form of a cover script. One of the jobs of this cover script is knowing what not to autoreply to (beyond extremely obvious things like messages that we detect as spam).

When I started out writing the cover script, I thought this would be simple. This is not the case, as what not to autoreply to has gotten a little bit more complicated since 1990 or so; for instance, there is now an actual RFC for this, RFC 3834. Based on Internet searches and this very helpful Superuser answer, the current list appears start with:

  • a Precedence: header value of 'bulk', 'list', or 'junk'; this is the old standard way.

  • an Auto-submitted: header value of anything but 'no', which is the RFC 3834 standard way. In practice, this is effectively 'if there is an Auto-submitted header'; I searched through a multi-year collection of email and couldn't find anything that used it with a 'no' value.

  • an X-Auto-Response-Suppress: header with effectively any value, although Microsoft's official documentation says that a value of 'none' means that you can auto-reply. In practice that multi-year collection of email contains no cases with the 'none' value.

    (Energetic people can look for only 'All' or 'OOF', but matching this is annoying and, again, my mail collection shows no hits for anything without one or the other of those.)

  • Any of the various headers that indicate a mailing list message, such as List-Id: or List-Unsubscribe:. In a sane world you would only need to look for one of them, but this is not a sane world (especially once spammers get involved); I have seen at least one message with only a List-Unsubscribe:.

  • A null (envelope) sender address, although of course any autoreplies to that aren't going to get very far. Generally you'll want to not autoreply to postmaster@ or mailer-daemon@, although it's not clear how much stuff gets sent out with such envelope senders.

In theory you could stop here and be nominally correct, more or less. In practice it seems clear that you want to do some additional matching on the sender address, to not auto-reply to at least:

  • Definitely various variations on 'noreply' and 'donotreply' sender addresses. You might think that people sending emails with these sender addresses would tag them in various ways to avoid auto-replies, but it is not so; for example, just yesterday Flickr sent me a notification email about some important upcoming changes that came from 'donotreply@flickr.com' and had none of those 'please do not reply' header markers.

  • Probably anything that appears to be an address that exists to collect bounces, especially tagged sender addresses. There are a bunch of patterns for these, where they start with 'bounce-' or 'bounce.' or 'bounce+' or 'bounces+', or come from a domain that is 'bounce.<something>' or 'bounces.<something>'. Just to be different, Google uses '@<something>.bounces.google.com'.

    Some of these 'bounces' addresses are also tagged with various 'do not autoreply' headers, but not all of them. Since tagged bounce addresses are always unique, they'll generally always bypass vacation's attempts to only send an autoreply notification every so often, which is one reason I think one should suppress autoreplies to them.

  • Perhaps all detectable tagged sender addresses, especially repeated sources of them. The one that we've already seen in our logs is AmazonSES ones, some of which don't have any 'don't autoreply' headers. Perhaps there are some AmazonSES senders who should get vacation autoreplies, but I suspect that there are not that many.

(I'm sure that there are some senders who would like to get vacation autoreplies so they know that their email is sort of getting through. It's less clear that our users want those senders to know that, given some of the uses of AmazonSES.)

Possibly you also want to not autoreply to sender addresses with various generic local parts, such as 'root', 'www-data', 'apache', and so on. Perhaps you also want to include 'info', but that feels more potentially questionable; there might actually be a human who reads replies to that and cares about out of office things and so on.

(In general my view is that it's only useful to send autoreplies to actual people, and in some cases sending autoreplies to non-people addresses is at least potentially harmful. If we can establish fairly confidently that a given sender address is not a person, not sending vacation and out of office and so on autoreplies to it is harmless and perhaps beneficial. At the same time it's important not to be too aggressive, because our users do count on their autoreplies reliably telling people about their status.)

PS: In an extremely cautious world, you would not autoreply to anything that hadn't passed either strict SPF checks or strict DMARC policies. You can use DKIM too, but I think only if you carefully check that you're verifying a DKIM signature for the sender domain, because only then have you verified attribution to the domain. I rather expect that this is too strict to make users happy today, because it would exclude too many real people that send them email and so should get their autoreply messages.

Sidebar: My guess about non-human email that lacks these markers

One might wonder why email notifications and other similar large scale messages don't have some version of 'please do not autoreply' tags. My suspicion is that people have found that email without such tags is more likely to appear in people's inboxes on large providers like GMail and so on, while email with those tags is more likely to get dumped into a less frequently examined location.

If you're someone like Flickr (well, SmugMug, who bought Flickr) and really do have an important message that many Flickr members need to read, this leaves you with an unfortunate dilemma. On the whole I can't blame SmugMug for making the email choice that they did; with data at future risk, it is better to err on the side of getting more autoreplies than having people not see your message.

(In this view, the 'donotreply' email sender address is mostly there in the hopes that actual people will not hit 'reply' and send email back, email that will not have the desired effect.)

Written on 07 November 2018.
« Our self-serve system for 'vacation' autoreplies and its surprising advantage
The future of our homedir-based mail server system design »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Nov 7 22:31:05 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.