Backup MXes versus redundant MXes

October 18, 2009

There's a potential confusion when one talks about 'backup MXes', so I'm going to throw some terminology around:

If an additional MX machine can accept inbound email and get it delivered all the way to the user's inbox even when all other of your MXes are down, you have a redundant MX.

If an additional MX machine can only accept inbound email but not actually get it delivered to your users (without the help of another MX), you have a backup MX.

Backup MXes have a bad reputation because the common and easiest way to implement them involves accept then bounce backscatter, because the backup MX doesn't know what the valid local usernames are and so just accepts everything (and then things bounce when the backup MX tries to pass the email to the main MX). But you don't have to implement a backup MX this way; if you have a list of valid local usernames, you can use it on your backup MX just as readily as you can on your main MX.

(You really should have such a list, but that's another topic.)

A primary MX is any machine that is among your lowest-preference MX targets. A secondary MX is anything with a higher preference MX. Secondary MXes can be either redundant MXes (presumably they're on smaller, less capable hardware, or you might as well make them primary MXes), or backup MXes.

Spammers like to pick on secondary MXes in the hope that they have less anti-spam precautions than the primary MX(es). While it's tempting to make your secondary MX hard-fail email if the primary MX(es) happen to be up at the time, you'll wind up losing real email if you do this; there are any number of network failure modes that can cause a legitimate sending machine to fail to talk to your primary MX and fall back to your secondary one, even though the primary MX is alive from your perspective.

(For example, the sending machine's network connection could have been down while it was trying to talk to your primary MX and then returned just in time to let it talk to your secondary. There's any number of things that can interrupt Internet connectivity for 30 seconds or so; transient DoSes, router or firewall restarts, upstream ISP connectivity issues, etc etc. And if your secondary MX is on a different network than your primary MXes, all of this applies in spades; in today's Internet there are any number of ways for someone to be able to reach network A but not network B.)

Written on 18 October 2009.
« Automated web software should never fill in the Referer header
The case against backup MXes »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Oct 18 00:10:13 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.