Choosing how slowly your mailer should time out email
Here's a little provocative question: why should mailers have timeouts on email delivery at all? Wouldn't it be more helpful to users to never give up on a message?
In practice, there are at least four reasons to configure your mailer to expire sufficiently old messages:
- some hosts or domains will never accept your email; they are permanently
unresponsive (sometimes in general, sometimes just to you).
- there's so much unexpired old email in the mailer queue that it starts
causing problems for your mailer or the system in general.
- sysadmins get irritated when they see huge queues of (so far) undeliverable
email; among other things, it's clutter.
- expiring and bouncing the message is how the sender gets a copy of
their email back so they can do something else with it.
(This is a pretty weak reason these days, since in today's world most people's mail clients already keep a copy of all of the email they send.)
The latter two reasons are relatively uncompelling, which leaves the first and the second as the primary drivers of email expiry times. And it's really just the first reason, because if you need to set email expiry time based on the second reason you probably already know it because your system has likely exploded under the load of too many old messages.
(I suspect that not many sites send enough email that the second reason is a serious concern. And if it is, there's a bunch of technical measures you can take to reduce the problem.)
The challenge with the first issue is telling the difference between a host that's temporarily down (or unreachable or experiencing problems) and a host that's permanently unresponsive. If we could do this reliably, the right thing to do would be to immediately bounce email for permanently unresponsive hosts and never expire email for hosts that are temporarily down. Since we can't, we have to go with a heuristic: we assume that hosts that have not accepted the email for N days are most likely hosts that are permanently unresponsive.
What's the right value of N? On the one hand, that's a good question. On the other hand, it's the wrong question because this is a heuristic. Heuristics generally do not have a single 'right' answer; instead they have a whole spectrum of answers depending on your specific circumstances (and in this case also depending on what the destination is; for instance, you might know a bunch about this for email inside your organization).
However, we can turn this around by asking what's a reasonable amount of time to expect a temporarily down mail machine to stay down before it gets repaired. My view is that there are reasonable circumstances where a mailer can be down for four or five days at a minimum. If this strikes you as extreme, consider a small organization where the mail machine dies on the afternoon or evening of a long weekend; take three days for the long weekend itself, and then they could easily lose a day or two to getting a new machine and configuring it.
(The extreme case around here is a small department who has their mail machine die just as Christmas vacations start. If they have no spare machines, getting a new server delivered might not happen until two or more weeks later, after Christmas vacations are over and the buildings are open again.)
So if you need a simple number that is not too large, my answer would be 'at least six days'. As it happens, our current mailer configuration times out email to outside domains after six days (more or less), although I don't think we did this sort of thinking about the issue before going with that number.