2009-12-22
How not to set up your DNS (part 20)
I call this one the case of the non-redundant redundant MX; it's much like the first time except more thorough:
; sdig mx mumble.utoronto.ca. 0 mail.mumble.utoronto.ca. 0 jackson.mumble.utoronto.ca. 5 mail.mumble.utoronto.ca. ; sdig a mail.mumble.utoronto.ca. 128.100.X.Y ; sdig a jackson.mumble.utoronto.ca. 128.100.X.Y
(mumble is not the real subdomain name; I just decline to identify them here because, well.)
So that's three MX records, two of which are literally redundant with each other, and all of them are pointing at the same machine. I'm not sure what happened here; perhaps the DNS zone file is organized such that it wasn't immediately obvious to people that they already had MX entries when they added more MX entries, or something.
(Or perhaps someone took the advice that one should have redundant MX entries a little bit too literally, similarly to what some people have done with NS entries.)
One of the interesting consequences of triply redundant non-redundant MX entries is that some mailers will probably take two or three times as long as usual to time out on delivery attempts should your mail server ever be down. Other mailers are smart enough to notice that everything is pointing to one IP address and only do one delivery attempt. And either way, it's probably doing odd things to mailer retry timers.
Do you have a network layout diagram?
Here's a not entirely hypothetical question: suppose that your machines are coming up after a building-wide power outage, except that a scattering of them (on various different networks) are either not up or not reachable. Could you look at what machines are failing to appear and identify whether there's likely to be a switch that's failed, and if so, where that switch would be?
(This happened to us yesterday. There was some confusion, because guess what we don't have.)
That sort of question is why you want to have a network layout diagram, something that tells you how your logical networks flow through your physical infrastructure and reach your various physical locations. Without a network layout diagram, you're relying on a combination of human memory and tracing connections around; even in the best of times, this is going to be slower and more error-prone than looking it up.
Another advantage of having things in an actual diagram is that it's generally easier to reason about things when you have a physical diagram in front of you. Without an actual diagram, everyone involved has to more or less reconstruct one in their mind in order to see the relationships; with a diagram, well, you just look at it, and you can just point at various bits to explain things.
For some people having an actual diagram for this will sound silly. To those people I say that you haven't gotten big enough yet.
(If you're relatively big yet you're still keeping all of this in your head and consider it good enough, ask yourself how you're going to bring a new person up to speed on the network structure.)
For some people, this will sound like a motherhood and apple pie issue; of course you have a network layout diagram. The problem with this view is that old issue: documentation is not free, and network layout diagrams are a form of documentation. If you have a non-trivial network that is under constant evolution (some would say 'churn'), keeping your network layout diagram up to date is going to take a commitment of time, and everyone concerned (management included) has to accept that this is going to slow down work.
(How much time it takes depends on what format you keep your network layout diagram in and how easy the format is to update. My personal choice would be graphviz format, because it's plain text, but I haven't actually tried to do this; it's possible that real network layouts are too complex for graphviz's automatic routing and layout.)