The case against backup MXes

October 19, 2009

The usual case against backup MXes is that they cause backscatter and they have less or no anti-spam precautions. But these aren't inherent problems with the idea of backup MXes, just bad implementations; it's not particularly difficult to do better, and anything can be implemented badly. I think there's a more fundamental case against them.

Pretty much by definition, a backup MX exists in order to avoid losing incoming email if your primary MX is down for an extended period of time. Outside machines trying to send you email will deliver it to the backup MX (which will sit on it for you) instead of timing out and bouncing the mail.

If you lose your primary MX, you're going to set up another MX (either recreating your primary MX or building a temporary backup MX of some sort). If you can do this faster than the amount of time it takes for outside machines to start timing out your email, having a backup MX doesn't get you anything; you won't lose email either way, and having one just changes which machine pending email sits on (your backup MX instead of the various outside machines).

Now, I have to admit that I don't have good numbers on how fast common mail systems will time out email, but my strong impression is that no one sane uses timeouts of less than several days. And my opinion is that if it takes you several days to build a new MX, you have serious problems in your overall systems environment.

(Note that this is not necessarily fully restoring your primary MX, just getting a machine to a state where it can start accepting your email.)

So my conclusion is that for most people, a backup MX is a waste: it consumes a machine in order to insure against a low-likelihood event (losing your primary MX and then not being able to recover it for several days).

Sidebar: on total network connectivity failures

This logic applies to major network connectivity failures too; we just adopt a somewhat expansive definition of 'losing your primary MX'. If your office (or machine room) loses its network for a good amount of time (hello backhoe), you get to pop out to the local wifi hotspot, rent at least one server somewhere (virtual or otherwise), set up your emergency backup DNS and backup MX from scratch, and start pointing your domains at them. If you have a cooperative off-site secondary DNS, you can skip the emergency backup DNS portions of this.

(The widespread availability of highly capable rentable servers makes this much easier than it used to be.)

Comments on this page:

From at 2009-10-19 11:46:33:

From personal experience shocking numbers of people run mail servers with 24 - 48 hour timeouts. Which is stupid.

The more I work with anything internet facing the more I realise you just can't depend on anyone to operate their service in any dependable way, which is a pain in the arse as it often means loads of extra work.

From at 2009-10-20 03:27:49:

Don't forget that if you have a backup MX you get to control when the queue is flushed. Without one, if your primary MX dies and you rebuild it, you have to give people a hand-wavey "well, sometime in the next 24-48hrs, but it depends on the ISP" response to questions about when overdue mail arrives.

Written on 19 October 2009.
« Backup MXes versus redundant MXes
Simple mailing lists: an illustration of Exim's flexibility »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Oct 19 00:49:33 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.