Wandering Thoughts archives

2010-02-03

Outdated documentation is especially risky for sysadmins

The obvious traditional risk of outdated documentation in all its forms is that you rely on it and go wrong somehow; you trust the comments in the source code and write your new code accordingly, and your changes don't work. I think that this risk is especially acute for sysadmins, for two strongly related reasons.

First, much of our documentation tends to be about procedures, not simple information. Following what is actually a wrong or incomplete procedure is a great way to create spectacular failures on the spot. Worse, sysadmins inevitably wind up dealing directly with live systems and live data.

(Yes, you can test procedures just as you test the code that you write, but at some point you have to use them on your live system and this is always somewhat different from the test environment, unless you have a spectacularly complete test environment.)

Second, some of the least used documentation (and thus our most risky ones) is our emergency procedures. When we need to use them, we're in one of the most tense situations possible, under a great deal of pressure to get things fixed now and thus least able to go slowly and carefully and stop if something, anything, seems off. This is the exact sort of situation where incorrect procedure documentation can do the most damage, because people don't stop before they compound a small problem into a huge one.

(Imagine, for example, an off by one error in documentation about how to map disk bay slots to device names. Now add a 'get things back up right away' crisis where you need to replace a disk.)

sysadmin/OutdatedDocumentationRiskII written at 23:20:41; Add Comment

Link: Pollution in 1.0.0.0/8

IANA has recently allocated 1.0.0.0/8 to APNIC, which has caused a certain amount of concern that it is 'polluted' by people already using it for various reasons. Pollution in 1/8 is a report from RIPE Labs on what happened when they announced routing for some bits of it as part of their debogonising work.

This is clearly going to be what they call 'interesting'.

(via Hacker News.)

links/Net1Bogons written at 12:02:33; Add Comment

How to destroy people's interest in updating documentation

Here is one of the less obvious perils of outdated documentation:

Suppose that you have some documentation that is out of date, but not in an obvious way; for example, you have an out of date network layout diagram. Since it's not obvious you don't realize this right away, so you keep on updating the network layout diagram when you make changes to your actual network.

Except that faithfully updating an inaccurate network layout diagram is relatively pointless. When you realize that it is incorrect, you are going to have to re-check most of it anyways, or at least spend a bunch of effort to reconstruct what sections are trustworthy.

This peril of outdated documentation is that updating bad documentation is wasted effort. (Fixing bad documentation is not, but that's a different thing.)

Since updating documentation takes time that you could be using for other things, and it's generally not fun, it does not take too much time to be wasted this way before people stop doing updating documentation entirely. Why do annoying wasted effort, when you could be doing something that's actually productive and useful? (Especially if you did the work thinking that it wasn't wasted effort, only to find out later that what you thought was productive work, well, wasn't. People really don't like that.)

At first, this effect will probably be limited to documentation that is highly suspect. But I don't think it takes much bad documentation before people more or less give up totally, because it is too heartbreaking to waste time this way and they can't stand the idea of it any more; you will lose the culture of documentation. At that point, you can stop talking about updating documentation and start talking about reconstructing it from scratch.

(This is where local wikis are perhaps less than ideal, because at this stage what you really need to do is pave everything so that there is a clear line between 'done recently, can be trusted' and 'is old, do not trust until it has been redone'.)

sysadmin/OutdatedDocumentationRisk written at 01:58:54; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.