Outdated documentation is especially risky for sysadmins
The obvious traditional risk of outdated documentation in all its forms is that you rely on it and go wrong somehow; you trust the comments in the source code and write your new code accordingly, and your changes don't work. I think that this risk is especially acute for sysadmins, for two strongly related reasons.
First, much of our documentation tends to be about procedures, not simple information. Following what is actually a wrong or incomplete procedure is a great way to create spectacular failures on the spot. Worse, sysadmins inevitably wind up dealing directly with live systems and live data.
(Yes, you can test procedures just as you test the code that you write, but at some point you have to use them on your live system and this is always somewhat different from the test environment, unless you have a spectacularly complete test environment.)
Second, some of the least used documentation (and thus our most risky ones) is our emergency procedures. When we need to use them, we're in one of the most tense situations possible, under a great deal of pressure to get things fixed now and thus least able to go slowly and carefully and stop if something, anything, seems off. This is the exact sort of situation where incorrect procedure documentation can do the most damage, because people don't stop before they compound a small problem into a huge one.
(Imagine, for example, an off by one error in documentation about how to map disk bay slots to device names. Now add a 'get things back up right away' crisis where you need to replace a disk.)
|
|