One reason we install machines from checklists instead of via automation
August 12, 2011
I've been revising our install instructions for some OpenBSD servers recently, giving me an opportunity to reflect on how we set up machines here. Our general approach is to use a checklist of essentially cut and paste commands; I often go through a test run directly cutting and pasting back and forth. Given that we have the literal commands to run in the instructions, why not automate the install process by putting them all in a script?
Well, turn this question around. What would I have to do in order to transform the commands into an install script? At one level, basically nothing; I'd turn all of the commentary in the install instructions into script comments and we'd be pretty much done. And then one day something would go wrong during the install process and the script would explode spectacularly.
The drawback of automation is that there is nothing that's really checking for things going wrong. Oh, you can check for obvious errors (sometimes), like commands exiting with a failed status, but not all problems cause such obvious failures. Any number of failure modes will cause your commands to exit with a success status but either do nothing useful or badly mangle the system state.
(For instance, a
You can make automation more robust, of course. But it takes both work and anticipating how things may fail; a reliable, cautious automated install process is much more work than simply sticking all of the commands from the checklist in a shell script (and it's very hard to really be completely safe against problems). If we stay with a checklist that's performed by humans, we get much of the benefits of automation without having to do that work. Rather than try to code error checks, we can count on people to use their brains to notice when something's wrong.
(In our environment checklists are guides and aids for sysadmins, not things to be carried out by mindless rote.)
PS: there are of course situations where automation still makes sense even despite this. But that's something for another entry.
Written on 12 August 2011.
* * *