Wandering Thoughts archives

2009-10-21

Why you should be able to get a list of your local email addresses

There is a decided tendency to create what I will call 'black-box' mailer configurations: you have a single mailer machine, and it is the only thing that knows what is and isn't a valid local email address (and sometimes, what is and isn't a local host or domain name). In such a configuration, the only way you can find out if a local address is good is to feed it to your mailer and see if your mailer rejects it (hopefully at SMTP time).

(It's very easy to create such a system, temptingly easy, even; you just start writing rules for what your mail system should do with addresses, as clever or as odd as you want. And let me tell you, with some mailers you can create pretty crazy things.)

I've come around to feeling that these black-box setups are a mistake, one that will eventually bite you on the rear. The core problem is that you can't reuse the knowledge of what local email addresses are valid, because that knowledge exists only implicitly in the mailer configuration; there is no explicit knowledge of it that other things can use.

As I have found out (almost the hard way), many of the ways that you'll want to expand and scale up your mail setup require such knowledge, in a simple, easily usable form. Want a good backup MX or a redundant MX? You'll need to know this. Want to move inbound email processing onto a separate machine in order to lower the load on your main mailer and do clever anti-spam filtering? You'll need to know this. Want to create a mail gateway for your local users that rejects invalid local usernames at mail submission time? You're getting the idea.

Thus, what you really want is a 'white-box' mailer configuration, one where it's feasible to generate a list of all of your valid local email addresses (and valid local domains). Having such a list is pretty much required to expand beyond a single mailer machine (possibly with some dumb satellites), and sooner or later you're all but certain to want to do this.

(Your actual central mailer configuration doesn't have to be based on the lists that you generate, although my experience is that making it partly so keeps you honest. In theory it's enough that you do generate the lists and you never let your mailer configuration get to a point where they're either inaccurate or incomplete, but remember the aphorism; if you're not using it, it's inaccurate.)

There are probably lots of ways of storing and using this information. I like plain flat text files, because they're easy to handle and pretty much anything that you want to have anything to do with can look things up in flat files. (And if you have such a huge volume of data that it's a problem, you can always convert them to more efficient lookup formats.)

WhiteBoxMailers written at 00:38:37; Add Comment

2009-10-20

Simple mailing lists: an illustration of Exim's flexibility

In PostfixVsExim I wrote that Exim is the better mailer if you want to do complex (even crazy) things, because it is more a mailer construction kit than a mailer (with fixed, pre-existing features). For the benefit of people who haven't been exposed to Exim, I thought I'd illustrate this with one of the things we did in our local Exim configuration.

We have a long-standing system of simple mailing lists. These lists are nothing more than user-owned files in a particular directory; to set up a mailing list, all you do is create a file by that name in the mailing lists directory (which is world-writeable but sticky-bitted, so you can't remove other people's mailing lists). The file's contents are what the mailing list will expand out to, with anything that you could put into a .forward allowed.

These simple mailing lists work like mailing lists should; if you mail to the list the envelope sender is rewritten to <list>-owner, and the mailer magically materializes both <list>-owner and <list>-request addresses, sending them to the owner of the list's file.

Exim has no particular built-in mailing list handling features, and certainly nothing on this level (and it shouldn't; there's a lot of policy decisions buried in this system). Instead we were able to build it ourselves out of some relatively simple building blocks that Exim does supply, particularly the ability to rewrite addresses and to expand addresses. It was even relatively straightforward.

(I won't claim that it's simple, because it's not really; we are doing some moderately twisted things once you peek under the hood.)

Shorn of a certain amount of extra complexity that we added later, there are three essential pieces of address processing that we set up:

  • if an address has a -request or a -owner suffix, and the suffix-less part is a file in the lists directory, the address is rewritten to be the owner of the file.

  • if an address is a file in the lists directory and the file is not writeable by anyone except the owner, the envelope sender is rewritten to be <list>-owner and the file is expanded to obtain new addresses (with the permissions of the user who owns the file).

    (Exim has a general expand-addresses-in-file facility, which is normally used to implement .forwards, and since it is used for .forwards you can tell Exim to limit its permissions.)

  • if the address is a file in the lists directory but the file is writeable by someone besides the owner, we expand the list as above but we set an Exim option so that none of the addresses in the file can be pipes or files. Pipes or files are normally allowed in file expansions, but in this case allowing them would be a security risk.

Exim has basic operations that will do a lot of this, but not quite all, so for some things we use some basic Unix programs (particularly stat, because we can get stat to not follow symlinks). Exim has more features to let us use external programs as sources of data and so on, which makes this possible.

SimpleEximMailingLists written at 01:08:17; Add Comment

2009-10-19

The case against backup MXes

The usual case against backup MXes is that they cause backscatter and they have less or no anti-spam precautions. But these aren't inherent problems with the idea of backup MXes, just bad implementations; it's not particularly difficult to do better, and anything can be implemented badly. I think there's a more fundamental case against them.

Pretty much by definition, a backup MX exists in order to avoid losing incoming email if your primary MX is down for an extended period of time. Outside machines trying to send you email will deliver it to the backup MX (which will sit on it for you) instead of timing out and bouncing the mail.

If you lose your primary MX, you're going to set up another MX (either recreating your primary MX or building a temporary backup MX of some sort). If you can do this faster than the amount of time it takes for outside machines to start timing out your email, having a backup MX doesn't get you anything; you won't lose email either way, and having one just changes which machine pending email sits on (your backup MX instead of the various outside machines).

Now, I have to admit that I don't have good numbers on how fast common mail systems will time out email, but my strong impression is that no one sane uses timeouts of less than several days. And my opinion is that if it takes you several days to build a new MX, you have serious problems in your overall systems environment.

(Note that this is not necessarily fully restoring your primary MX, just getting a machine to a state where it can start accepting your email.)

So my conclusion is that for most people, a backup MX is a waste: it consumes a machine in order to insure against a low-likelihood event (losing your primary MX and then not being able to recover it for several days).

Sidebar: on total network connectivity failures

This logic applies to major network connectivity failures too; we just adopt a somewhat expansive definition of 'losing your primary MX'. If your office (or machine room) loses its network for a good amount of time (hello backhoe), you get to pop out to the local wifi hotspot, rent at least one server somewhere (virtual or otherwise), set up your emergency backup DNS and backup MX from scratch, and start pointing your domains at them. If you have a cooperative off-site secondary DNS, you can skip the emergency backup DNS portions of this.

(The widespread availability of highly capable rentable servers makes this much easier than it used to be.)

AgainstBackupMXes written at 00:49:33; Add Comment

2009-10-18

Backup MXes versus redundant MXes

There's a potential confusion when one talks about 'backup MXes', so I'm going to throw some terminology around:

If an additional MX machine can accept inbound email and get it delivered all the way to the user's inbox even when all other of your MXes are down, you have a redundant MX.

If an additional MX machine can only accept inbound email but not actually get it delivered to your users (without the help of another MX), you have a backup MX.

Backup MXes have a bad reputation because the common and easiest way to implement them involves accept then bounce backscatter, because the backup MX doesn't know what the valid local usernames are and so just accepts everything (and then things bounce when the backup MX tries to pass the email to the main MX). But you don't have to implement a backup MX this way; if you have a list of valid local usernames, you can use it on your backup MX just as readily as you can on your main MX.

(You really should have such a list, but that's another topic.)

A primary MX is any machine that is among your lowest-preference MX targets. A secondary MX is anything with a higher preference MX. Secondary MXes can be either redundant MXes (presumably they're on smaller, less capable hardware, or you might as well make them primary MXes), or backup MXes.

Spammers like to pick on secondary MXes in the hope that they have less anti-spam precautions than the primary MX(es). While it's tempting to make your secondary MX hard-fail email if the primary MX(es) happen to be up at the time, you'll wind up losing real email if you do this; there are any number of network failure modes that can cause a legitimate sending machine to fail to talk to your primary MX and fall back to your secondary one, even though the primary MX is alive from your perspective.

(For example, the sending machine's network connection could have been down while it was trying to talk to your primary MX and then returned just in time to let it talk to your secondary. There's any number of things that can interrupt Internet connectivity for 30 seconds or so; transient DoSes, router or firewall restarts, upstream ISP connectivity issues, etc etc. And if your secondary MX is on a different network than your primary MXes, all of this applies in spades; in today's Internet there are any number of ways for someone to be able to reach network A but not network B.)

BackupMXvsRedundantMX written at 00:10:13; Add Comment

2009-10-16

A tale of network horror, or at least excitement

(This story comes from my co-worker John Calvin, who told it to me some years ago; I was reminded of it by some recent local events, so it seems like a good time to put it here.)

One of the things that the central computing people here can do for departments is run their basic networking infrastructure, the switches and wiring and so on. Once upon a time, such a managed departmental switch started lighting up the monitoring system with repeated, frequent contact failures; when the monitoring system went to poll the switch, it often wouldn't respond.

(It also often didn't respond on the telnet-based management console.)

Normally this means a failing switch. But this switch didn't seem to be dying; the department wasn't reporting any network issues, and when you could talk to the switch, it would report no errors or problems. It was just that fairly often, it wouldn't talk to the monitoring system. Various people got pulled in to try to figure out what was wrong, and what could be done about it, and finally they found it.

The switch was configured with two VLANs, an 'inside' and an 'outside', because the department had been planning to introduce a firewall. However, they hadn't gotten around to doing so, and in the mean time they'd simply used a network cable to directly connected what would have been the firewall's inside and outside network ports. Let us call these ports A (on the outside VLAN) and B (on the inside VLAN).

Switches need to maintain a mapping between Ethernet addresses and ports that they're reached on (otherwise they turn into hubs). As it happens, this switch only had a single global mapping table, not a per-VLAN mapping table, and the mapping was maintained by the switch's management processor, not its core switching engine.

(Roughly speaking, switches are divided into a high-speed switching engine and a slower management processor. The switching engine directly handles simple things and defers more complicated situations to the management processor, which is also responsible for answering SNMP queries and so on.)

So imagine what happens when a packet from the network router flows through the switch to an internal port. First, the switch sees a packet from the router's MAC on the router port, so it learns that MAC/port association. The packet then goes out port A so it can hop between the outside and inside VLANs, and suddenly the switch sees a packet from the router's MAC on port B. Since the switch only has a single global mapping table, it must now remove the old association of that MAC with the router port and add a new one associating it with port B. This entire port association flip-flop repeats for every packet from the outside world to a local machine, and it also happens in reverse for every packet from a local machine to the outside world (as first the switch sees the machine's MAC on its actual port, and then on port A). And every flip-flop has to be handled by the management processor.

As it happened, the management processor was basically melting down under the load of handling all of these flip-flips. When this happened, the management processor did the sensible thing and devoted all of its CPU power to the high-priority task of mapping table maintenance, and dropped lower-priority jobs on the floor, jobs such as responding to SNMP queries or to the management console.

(I should note that this was not a cheap switch; this was just quite a while ago, back when gigabit was an expensive novelty, 100 Mbits was pretty fast, and mammoths had just stopped roaming the earth.)

SwitchedHorror written at 02:20:12; Add Comment

2009-10-10

You should delete obsolete data files

Around here, our systems have layers of accreted history; scripts that are generating files that are used by, well, we're sometimes not entirely sure any more. Every so often we reach the point where we turn another one of the file-generating scripts off (we take them out of crontabs, we remove invocations of them from other scripts and so on).

Having done this for a while now, I have a suggestion: when a data file becomes obsolete and is no longer updated, you should immediately delete it. Don't keep it around just in case anything still refers to it, because if there are any remaining users, you want things to break right away, when you remember what you just did recently.

(If you are lucky, things will break with error messages about 'cannot read file X' and it will be obvious why. But sometimes things will just malfunctioning, and then it really helps to have a recent change to blame.)

The problem with leaving such files around just in case is that you still get breakage, but it is much more subtle breakage. What happens is that the file slowly slips further and further out of correspondence with reality (as reality keeps changing but it doesn't), and sooner or later this divergence starts producing odd results. Things work for old accounts (or old bits of data in general) but not for new enough ones; things go to the wrong place; deleted things mysteriously resurface or still part-work. Straightforward, immediate breakage is more painful (and perhaps more embarrassing if you overlooked something important) but is much better in the long run.

I admit that this is hard for me to do; I'm a packrat by nature, and even in an environment with version control systems and backups my instinct is to keep old data files around just in case. But just in case almost never shows up, so I need to wield the rm's more often.

DeleteObsoleteFiles written at 01:19:57; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.