Understanding Exim's weird way of doing retries

September 30, 2007

First, some terminology. A top level address is an address that a message starts out being sent to; for example, every (accepted) RCPT TO in SMTP creates a top level address for the message. A destination is a place that a message is ultimately going to be delivered to, and may include things like files. A top level address may turn into more than one destination through means like .forward files, aliases, and mailing list files. At a conceptual level, all MTAs have two main jobs: mapping top level addresses to destinations, and then delivering to destinations.

The MTA we use now takes what I think of as the straightforward two-phase approach to this. Top level addresses are mapped to destinations in a process called 'routing', the MTA tries to deliver to all destinations, and any destinations that had some temporary delivery failure are remembered to be retried later. When the MTA does retries, it looks up all still undelivered destinations and tries to deliver to them again.

(This ignores retry times, bounces, incomplete DNS lookups, and so on.)

The important effect of the two-phase approach is that a given message's destinations never change; they are determined once when it is received and then frozen.

Exim does not operate this way. Instead of remembering undelivered destinations and retrying them directly, it remembers delivered destinations and whether or not a top level address was completely delivered. Then each time exim retries a message, it re-maps any top level address that has not been completely delivered to destinations, throws away any destination that has already been delivered to, and (re)attempts deliveries on any remaining destinations. (If there are no remaining destinations, the top level address is done.)

So a given message's destinations can change during retries. The consequence of this is that messages being retried will pick up changes to .forwards, file-based mailing lists, and so on; as the Exim documentation notes, this can result in things like new subscribers to a mailing list receiving messages that were sent to the mailing list before they actually subscribed.

(Exim has a one_time option for redirect-based routers that will turn destination addresses into top-level addresses. But because top-level addresses have to be real addresses, Exim has to outlaw pipe and file destinations if you turn this on, and this is not an option for us.)

This approach does let you correct a routing problem on the fly; you don't need to change a routing rule and then manually change the destinations of a pile of stalled messages. But it makes it hard to see what destination is causing messages to stall (and what the error message is), since undelivered destinations only exist during retries; mailq and so on will only tell you what top level addresses haven't been completely delivered (and what destinations have been delivered).

(Technically the information can be dug out of the logs with sufficient work.)

(This is one of those entries I write to make sure that I understand the issue myself.)

Written on 30 September 2007.
« Weekly spam summary on September 29th, 2007
How Exim determines the retry time for local deliveries »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 30 23:16:53 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.