Wandering Thoughts archives

2007-11-28

The problem the automounter was trying to solve

The automounter was more or less created to solve one problem: trying to avoid having your machine hang when any of the huge list of pokey machines that exported NFS filesystems that you needed once in a blue moon went down. Again.

This problem is really an artifact of a much earlier age, of a time when disk space was so expensive that any machine with any amount of surplus disk space was pressed into general service. This created a massive web of NFS crossmounts and in the end made everyone's machine depend on all the machines. But those days are long gone. (At least I hope so. They weren't pleasant days even with automounters.)

I've always felt that the automounter solution to this was more of a hack workaround than a real solution. It worked, as long as you were lucky, but the needs of the solution created their own set of problems, and in modern environments the cure can now be worse than the disease.

(Locally, we got so peeved at the various problems the automounter was causing us that we've replaced it with something that does more or less what I wanted.)

AutomounterReason written at 23:57:28; Add Comment

Another aphorism of system administration

Here is a principle of practical system administration:

Later never comes.

Specifically, don't defer something until 'later', because it's never going to happen. Unless you are exceptionally well disciplined and lucky, either you will have lost track of things by the time you have the free time or you'll discover that any number of other bits now depend on the state of the world as it is, and it is no longer at all simple to change it.

(A surprising amount of practical system administration is about not losing track of things. Unfortunately this is not my strongest suit.)

SysadminAphorismV written at 23:29:37; Add Comment

2007-11-16

Improving your life with checklists

I recently migrated a fairly important chunk of our mail service from an old mailer on an old server to a new mailer on a new server. It wasn't a simple process, as it involved moving data from the old server to our SAN, updating all sorts of configurations, changing scripts, and various other things. But I managed to do it smoothly, with only one or two moments of panic, and one of the big reasons for this is I made a detailed checklist.

Well before I started the migration, I set out to write down all of the things I'd need to do. I didn't let myself get away with vague handwaves or high level overviews; I made myself write down precise details just to make sure I really did understand exactly what I'd need to do when the time came and all of the implications and consequences.

(As you might guess from the moments of panic, I can't claim that I got this completely right. But I did manage to mostly foresee things.)

At the same time, I didn't try to use the checklist as a frozen thing that had to be fully complete before I did anything and then followed mechanically. I reordered items, sometimes on the fly, added new things, changed entries, and so on, and actually wound up using the checklist file as kind of action log to keep track of what steps I'd done and any notes I had.

Once I'd written down enough of the checklist it became clear that some things could be done in advance, and some more or less in parallel, so at times I was hopping back and forth between preparatory items depending on what I felt like working on next. A few times I realized it made sense to move entire chunks of preparation earlier or later, or split up something that had seemed monolithic into preparation and execution chunks. And as things went on it became clear that some items were simply not as crucial so could be deferred.

None of this should really be news to me; I've seen detailed checklists work before, and I've seen a lack of them cause problems. In part I'm writing this experience down now as reinforcement for myself, so that next time I'm more likely to do it again.

UseAChecklist written at 23:10:25; Add Comment

2007-11-14

What NAT is useful for

I can think of at least three things that NAT is good for:

  • it compacts address space; many machines can be behind a single IP address.
  • it makes a decent outgoing-traffic-only firewall, which provides a significant amount of protection to machines.
  • to some degree it denies outsiders what I'll call 'traffic intelligence'; how many machines you have, how they're grouped, and what machine or group is responsible for what traffic.

(Sometimes the lack of traffic intelligence can be a problem, such as when the campus network people report that our primary NAT gateway machine is doing an awful lot of suspicious traffic.)

The second thing is certainly important to us, and I suspect that the third thing is important to many companies. There are ways of working around both of these additional benefits if an attacker is determined and skilled, and of course NAT is not the only way of providing either. But it's certainly useful that both benefits come along for free when you're already using NAT for the first reason, and especially that they happen automatically, without the need for any special configuration.

IPv6 eliminates the need for address space compaction but does nothing in particular to deal with the other two things NAT is good for (and if IPSec really does become pervasive, IPv6 may complicate the second significantly). This can make sysadmins unhappy, especially when well intentioned people tell them that IPv6 has made NAT unnecessary.

(Note that denying traffic intelligence is very important in some consumer environments, where your ISP is attempting a revenue grab by charging extra for the privilege of letting you connect multiple machines.)

WhyNAT written at 23:17:05; Add Comment

2007-11-02

Note to self: check for gigabit Ethernet

One thing I should remember is that before I start measuring things like iSCSI performance, I should check to make sure that I'm actually using a gigabit Ethernet connection. Conversely, if I get unexpectedly terrible high-level performance, perhaps I should make a point of immediately checking the network speed. Especially if the terrible performance is around 10 megabytes a second.

(This is especially embarrassing when I forget this for the second time.)

This can happen to us for a combination of two reasons; first, we don't have a switch in each rack, and second, we still have a mix of gigabit and 100 megabit switches in our machine room. This makes it easy to have accidents and omissions when one drags cable around for new machines, including the ever-popular game of 'which of these two identical cables is gigabit and which is only 100-mbit?'

(One of the morals of that game is to label the cables before you start running them under the floor.)

I think we may now have enough gigabit switches that the sensible thing to do might be to rip out all of the 100 mbit switches that are on (currently) mixed networks and replace them with gigabit switches. At the very least we could do this for all of the networks that we use in the machine room.

(There are some networks that are 100 mbit only and are almost certainly going to stay that way, generally because they have a high wiring density.)

Sidebar: on the naming of speeds

On a side note, I have to say that '100 megabit' networking really could do with a less awkward label. Gigabit Ethernet has, well, 'gigabit' and 10 gigabit Ethernet is commonly '10G', but all of '100 megabit', '100-mbit', '100 Mbps', and '100M' read oddly to me, and '100TX' feels too obscure.

(Plus, if we are being really obscure, Wikipedia tells me that there are several 100 megabit over copper standards, not just 100BASE-TX, although the alternatives have pretty much died out.)

CheckEthernetSpeed written at 23:38:12; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.