Rethinking when your mailer sends 'not-yet-delivered' warning messages

June 9, 2012

Most Unix mailers have a feature where they periodically send people warning notes about email messages that haven't been delivered yet; Exim defaults to doing so roughly every 24 hours, for example (this is the delay_warning configuration setting, cf). I've come to believe that typical default values for when this happens are both too slow and too frequent, and should be rethought in today's Internet mail environment.

(Actually it looks like Postfix defaults to not sending delay notifications, so this may now be less common than I think.)

The reality of modern mail delivery, at least for us, is that mail delivery times are very skewed and likely quite bimodal. In particular, almost all of the mail sent through our outbound user email gateway is delivered to the remote SMTP server within a minute (and much of it is delivered within seconds) and most of the remainder is delivered within ten minutes. I have the strong feeling that people have come to expect this rapid email delivery, simply because it's what almost always happens.

In this environment, delaying initial notification of a delay for a day is not really what you want since a day is well after people expect their email to have been delivered. How soon the first notification should be sent depends on how mail delivery delays break down in your environment, so I recommend getting statistics. My gut feeling is that an hour is a good starting point under normal circumstances; when people hit an address that genuinely isn't accepting email, they'll probably get notified about it soon enough to know that they should take other steps.

(A related issue is giving people relatively prompt notification when they've misspelled a domain name into a variant that doesn't accept email. The hope is that they'll notice the mistake when they get the notification email and be able to resend their message soon enough to be useful. My unchecked intuition is that such misspellings are one of the significant sources of long term delayed email in our environment.)

However once we've sent one or two delay notifications I don't think there's much point in repeating them. My view is that by the time a day has gone by without a successful delivery, the person who sent the email is no longer really expecting it to get through any time soon. If getting the information through or getting a reply was important they'll probably already have taken alternate steps so further delay notifications are pointless, and if the email's not important the delay notifications are just noise in general. If you have a long message delivery timeout you might want to send another delay notification partway through, but certainly not one a day.

Of course all of this generously assumes that people actually see and read the delay notifications that your system sends them. If people filter them out or just reflexively delete them, you might as well not send them at all (although there may be political reasons to send them even though they'll get ignored). I don't have any information on this, although I have my suspicions.

PS: hopefully it goes without saying that you should only be sending delay notifications to your own local users.

Comments on this page:

From at 2012-06-09 08:50:27:

However once we've sent one or two delay notifications I don't think there's much point in repeating them. My view is that by the time a day has gone by without a successful delivery, the person who sent the email is no longer really expecting it to get through any time soon.

I think the default for many mailers is to warn after one day, and give up after five. Given the above sentence, in addition to a warning after an hour (as you state), it seems that you opinion is that the mail system should give up after 1-2 days.

By rdump at 2012-06-10 00:16:09:

The nexus of typical default greylisting timeouts, sending MTA retry intervals, and usefulness of the notification to the sending user is a tough one to deal with.

Greylisting systems often default to allowing a retry after 25 minutes. A typical 1/2 hour MTA default interval will mesh nicely with this. Typical defaults also are such that an over-eager sender will be ignored, or worse, if they retry before the 25 minutes have passed.

While it might be useful to send a notification of delivery delay to the user within minutes, sending it after two 1/2 hour retry intervals seems like a sweet spot.

On the tail end, I agree that it doesn't make a lot of sense to send notifications of delay more than once or twice. After the first, the rest risk becoming noise. The notification of delay should of course specify how long the message delivery will be retried (typ. 4 days).

By cks at 2012-06-10 00:44:18:

My reply to the mailer expiry issue got long enough that I made it into an entry, EmailDifferentSorts.

Since there's no standard for minimum retry intervals as far as I know, greylisting systems that insist on any particular minimum value and punish shorter ones are playing with fire (or at least false positives). For example, we start out with an every fifteen minute (more or less) retry interval for external domains.

(We retry much faster for machines inside the university since we assume that any delivery problem is not greylisting but a temporary glitch, and so it's a service to our users to get the email through fast in the way that they may be expecting.)

I agree that greylisting in general will complicate the decision about initial delay notification, which is why I think there's no substitute for generating your own stats on delivery delays. Our stats suggest that our users are not running into greylisting very often, although I'd kind of want to do more detailed analysis to be sure.

By rdump at 2012-06-10 02:42:57:

The sender MUST delay after a temporary failure. In general, the retry interval SHOULD be at least 30 minutes. More intelligent strategies, based on what the mailer can determine about the nature of the temporary failure, are good; 30 minutes is only a SHOULD after all.

Naive IP-based greylisting can be accidentally bypassed by spammers sending separate messages to multiple recipients if the allowed retry interval minimum is shortened too far. Smarter greylisting that uses sender and recipient email addresses as part of the tuple can be bypassed by a multi-message spam run targeted at the same user. Detecting and defending against those attacks, without accepting the message body and doing fuzzy checksumming to add to the tuple, is an exercise in tuning, so most go with the defaults.

Retrying every 15 minutes against a default greylister that will talk to the sender again after 25 minutes is unlikely to hurt; the connection for the first retry will just be dropped. Exponential backoff will relax things further. Retrying every minute, on the other hand, is more likely to be alarming.

I think there's a case to be made, based on general modern networking + hosts being more reliable, and and temporary resource problems more quickly resolved, for shorter retry intervals being a more intelligent response. That's why I don't go with a default 25 minutes.

Written on 09 June 2012.
« An (accessible) explanation of the Flame malware's Windows Update compromise
Modern email is actually multiple things in one system (mailer timeouts edition) »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jun 9 01:57:22 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.