The role of superstition and folklore in system administration

December 21, 2008

Just like users have folklore, system administration does too. Our sort of superstition is a bit different, though (well, usually): it is the kind of thing where you say 'I don't know why that's there, but let's not remove it just in case'. When our system environments reach a certain level of fragile complexity and we start losing track of the fine details, of course our informed actions start descending into rote procedures.

(This really accelerates when new people come on board; they weren't around to build the systems, so they don't have the picture of how everything fits together in their head, and I think that even good documentation will never really build it.)

Once you lose track of exactly why something is done, the principles of change control come into play. When something works as it is and you're not certain why, you have only two real choices; you can take the time to see if it's still correct and necessary, or you can just keep on doing it until something explodes. It should be no surprise which choice busy sysadmins usually make, and thus you get the superstitions, all of those things that once had a reason (we hope) but we no longer know what it was.

(This growth of superstition shows up in any area of system administration where you can lose track of things, like expensive names.)

Descents into superstition are not fatal, but they are expensive to reverse; you have to actively make the time to reverse engineer how your system really works, and do it thoroughly enough that you're confident that you didn't miss anything. Sometimes you can only successfully get rid of the superstition when you replace the entire system (so you haven't so much fixed it as rendered it irrelevant).

Recognizing when a system is sliding into superstition is important, because it's a serious warning sign both that your system is too complex and that you do not understand it well enough. Continuing with things as they are is likely to result in more and bigger superstitions taking hold, with the attendant loss of real understanding and control of your system.

Written on 21 December 2008.
« The source of spurious .rpmnew files
Part of why managing firewalls is hard »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Dec 21 01:48:30 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.