2007-10-24
The format of PTR records in Bind irritates me
How often have you seen a reverse DNS entry of
host.dom.ain.10.11.12.in-addr.arpa.? I've seen it too often, and
I've even created them too often. Such incorrect reverse DNS entries
exist only because Bind makes it all too easy to shoot your foot off by
insisting on perfect consistency; for PTR records, as for all other
records, a name without a terminating dot is taken to be in the name
of the zone. This is despite the fact that this makes no sense for PTR
records; the only valid use for PTR records with names in the zone of
the file is excessively clever.
(Yes, this is not the only error you can make in zonefiles. But it's one of the few that is syntactically valid but semantically wrong in a way that Bind could trivially detect.)
Given that in-zone PTR records make no sense, Bind could have saved a lot of people a lot of problems over the years if it had simply not accepted them, either by making a missing dot an error or by silently adding it if necessary. It could even have made the choice a global option; error out, fix up, or accept as is. But instead it stuck with a format that almost invites this error, and so people keep making it all the time.
(Note that I am not fond of going to the other extreme, as djbdns does, where all names have to be written out in full. There are a lot of convenient uses for partial names in DNS zone files, although we have a skewed perspective since we're in two top-level domains.)
2007-10-16
Our old mail system's configuration
Before I can talk about more interesting mailer things, I have to explain how our old mail system was configured.
Our old mail system makes perfect sense once you realize that it was
more or less designed around the idea that nothing should ever have
to be done over NFS. In order to manage this, each different sort of
processing had to be done on the machine that held the relevant files;
deliveries to /var/mail were done on the postbox machine, which had
/var/mail on local disks, and mail for real users was handled on the
user's fileserver, because only that machine could even check for a
.forward (much less expand it) without NFS being involved.
So the mail flow for incoming mail went like this:
- mail came in to the central mail machine, which expanded aliases and
local mailing lists (using data files that it held on its local
disk). The central mail machine was the MX target for our domains.
- mail for users was then sent to the user's fileserver. If the user
had a
.forwardit was expanded; otherwise the fileserver sent it to the postbox machine (and sent a copy to the 'oldmail' machine, which kept a read-only copy of the past 14 days or so of email for each user).Users were encouraged to have procmail and so on deliver their mail not by directly writing it to
/var/mail(which would have involved NFS) but by forwarding it to the postbox machine. - finally, the postbox and oldmail machines actually delivered the mail to the appropriate mailspool.
Mail routing was done by rewriting the destination addresses. Partly as a result, the postbox and oldmail machines only did their delivery for addresses in magic forms; if you sent other mail to them, they passed it back to the central mail machine.
(The fileservers passed email for the outside world back to the central machine instead of trying to deliver it directly. This had both good and bad effects.)
One consequence of this design is that all of the machines involved had
to NFS export things to our login and compute servers, because they
all had local storage. The postbox machine had to export /var/mail,
the oldmail machine had to export /var/oldmail, and the central mail
machine had to export the data area for local mailing lists so people
could change them.
(Conveniently, the postbox machine was also the IMAP/POP server, so that bit didn't have to worry about mailbox locking over NFS.)
There was a separate mail submission machine for outgoing user email, whether from our servers or from user PCs. It forwarded mail for local destinations to the central mail machine and otherwise sent the mail directly to the outside world.
2007-10-12
Getting your networks to your racks
I'm in the process of getting a new test server installed in a rack in our machine room. This means we needed to set up some network connections for it, which involved someone dragging yet more cable around our machine room and finding switch ports to plug into to get the necessary networks (which turned out not to be an entirely trivial thing).
As a result, the whole thing got me thinking about the issue of the best way to get all of your networks to where they need to be in your racks. The good approaches that I can think of right now are:
- run each machine's network cables out of the rack to central switching
points. But if you do this, you can have more and more cables snaking
under the floor, and part of having neat
racks is running as few cables as possible out of
each rack.
- one ordinary gigabit switch per rack, with all the servers connected
to it and as many networks as you need brought into the switch as
VLANs. But then the entire rack is sharing a 1 gigabit uplink into
your core fabric, which may not be enough.
(I don't think that trunking is a good workaround, because it will consume switch ports in your core fabric at a ferocious rate.)
- one switch with a 10 Gb uplink per rack. This is costly, especially at your central switching points (switches with more than two 10 Gb connections are apparently really expensive).
The first approach can deliver the most efficient network bandwidth, because you can clump machines that all need to talk to each other together on one switch, regardless of where they are physically located. You can't count on clumping such machines together in a single rack for at least two reasons; first, you may have more than a rack's worth of such machines, and second, machines may need bandwidth to more than one such clump.
Our current approach in practice is mostly the first option, partly because almost all of our new racks are only one row over from our network core. We have one new rack that is all the way on the other side of the room; fortunately it doesn't currently need more than a gigabit in total and its machines are all on one network. (And the rack is nearly full of modern machines, so the latter is unlikely to change for several years.)
2007-10-08
I have new system enthusiasm
I get it every so often: a new system that consumes my attention, my thoughts, and my interest. I find myself thinking about it all the time and then working on it all the time; it doesn't matter that I'm not at work, I want to work on it anyways so I do. And I'm impatient to see what I've built go into production, to actually get used, to give me feedback.
The current new system enthusiasm is for our new mail environment, which we have finally constructed and are now starting to deploy. Slowly. Conservatively. Bit by bit, which leaves me chomping at the bit to go faster (and deeper, with more changes to the old environment than I thought of initially).
Whenever this happens, I try to remember that what I am feeling is an irrational enthusiasm. However much I feel that the new mail system is ready for full deployment right now, it's not an entirely rational feeling. And thus, however much it makes me do an impatient dance, we should do a cautious staged deployment, and not accelerate the timetable (or change it) without good, solid reasons.
(I also have to remember that counting on my co-workers to restrain my enthusiasm is not necessarily wise. I know the most about the new system, so I may well be able to do a good enough sales job to talk them into something despite it being a bad idea, whether or not I realize it. After all, I am in a great position to talk about all of the benefits of the new system and enthusiastic enough to talk down the risks, because with my enthusiasm comes confidence.)
I'm still looking forward to tomorrow, when I get to throw the switch
to make more of the new mail system live. (Right now all it is doing
is handling final delivery of mail to /var/mail; tomorrow it starts
actually routing some email.)
2007-10-05
Why we don't use cable management arms
One of the things I discovered when I came here was that while cable management arms came standard with many of the servers we bought, we weren't using them; in fact, when we moved servers that had them, we tended to remove them. This surprised me a bit, since they're generally a part of a well organized rack, and I recently got around to asking my co-workers about this.
It turns out that there is a simple reason: heat dissipation. Our racks are all open racks, so our servers are cooled with plain front to back airflow, and it turns out that a populated cable management arm effectively blocks this. In the days when we used them, my co-workers noticed that the rear of those servers were real hot-spots, significantly warmer than the surrounding air. However nice the organization and neatness is, we'd rather have cool servers.
(It seems that cable management arms work best in enclosed racks where you're using vertical cooling, with cold air forced in at the bottom and exhaust fans at the top to pull out the hot air.)
In general, the more I've been here the more I've come to feel that, however much I like admiring them, the conventional wisdom represented by nicely organized racks are the results of being able to plan your racks, build them mostly at one time, and not have things change all the time. This is not how things are here; even when we have a plan for a rack we usually don't get all the hardware at once, and things change all the time (sometimes rapidly). In this environment, things like those neatly and tightly tied cable bundles are a disaster waiting to happen the next time you need to change things, or at least an annoyance; they look good only so long as you don't need to touch them.
(The same thing holds true for cable labels; much like comments in computer programs, their presence may delude you into believing them.)
You can probably maintain such neatness even in an environment of uncertainty and change, but it has a definite cost in time (and thus in money), one that we cannot justify.
(Having said that, I should probably neaten up a server or two of mine, or at least figure out what the best way to run their cables is.)
2007-10-01
How Exim determines the retry time for local deliveries
The Exim documentation is a little silent on how Exim determines the retry time for local deliveries. Since I spent today looking into this, I might as well write up what I've learned.
Exim's retry rules are based on matching patterns against 'the failing host or address' (as the documentation puts it); call this the retry key. For local deliveries, it turns out that the retry key is the full address (including any local part prefixes or suffixes) that is handed to the transport by a router.
For routers that directly invokes an appendfile-based transport (such as a typical local delivery router and transport), this is just the email address involved, and can be matched with all of the usual retry rule patterns. This does mean that it can get confused with regular SMTP host retry patterns, unless you first rewrite addresses to a unique form and then do local delivery based on that unique form. Fortunately we are already doing this for other reasons.
For redirect-based routers that directly generate general file or pipe
destinations (such as one that processes .forwards), the address
that the transport sees has a domain of '|....' or '/...' and
no particular local part; I believe that the only way to match these
is with regular expression patterns. To save people the effort, the
patterns I have tested are:
^\N\@\|.$\N |
for pipe destinations |
^\N\@/.$\N |
for file destinations |
(You don't need to worry about pipe destination retries unless you set
timeout_defer on the relevant transport. We do set it because we've
seen pipe-based deliveries get hung up due to things like full disks
and we don't want to bounce email in this situation, plus it means we
can lower the pipe delivery timeout from an hour.)