One reason we install machines from checklists instead of via automation

August 12, 2011

I've been revising our install instructions for some OpenBSD servers recently, giving me an opportunity to reflect on how we set up machines here. Our general approach is to use a checklist of essentially cut and paste commands; I often go through a test run directly cutting and pasting back and forth. Given that we have the literal commands to run in the instructions, why not automate the install process by putting them all in a script?

Well, turn this question around. What would I have to do in order to transform the commands into an install script? At one level, basically nothing; I'd turn all of the commentary in the install instructions into script comments and we'd be pretty much done. And then one day something would go wrong during the install process and the script would explode spectacularly.

The drawback of automation is that there is nothing that's really checking for things going wrong. Oh, you can check for obvious errors (sometimes), like commands exiting with a failed status, but not all problems cause such obvious failures. Any number of failure modes will cause your commands to exit with a success status but either do nothing useful or badly mangle the system state.

(For instance, a ./configure 'succeeds' but fails to find all of the dependencies you expected so it builds a version of the program without features that you need.)

You can make automation more robust, of course. But it takes both work and anticipating how things may fail; a reliable, cautious automated install process is much more work than simply sticking all of the commands from the checklist in a shell script (and it's very hard to really be completely safe against problems). If we stay with a checklist that's performed by humans, we get much of the benefits of automation without having to do that work. Rather than try to code error checks, we can count on people to use their brains to notice when something's wrong.

(In our environment checklists are guides and aids for sysadmins, not things to be carried out by mindless rote.)

PS: there are of course situations where automation still makes sense even despite this. But that's something for another entry.


Comments on this page:

From 195.26.247.141 at 2011-08-12 04:11:17:

Ouch. No really.

How does this possibly scale?

Do you use checklists whenever you have updates to apply (rather than just at first install)?

What happens when a change is made to the install checklist? Do you go around all the machines one by one and make those changes?

From 71.183.242.227 at 2011-08-12 11:16:15:

I currently use checklists as well, but I'm always hoping to find time to get to automation. I think some of the keys here are making sure you do the right thing in the right place. It's been a while since I ran OpenBSD, but your example of running 'configure' is an indication to me that you're not doing it in the right place. Compilation and packaging should be done in a separate process, and the automated install should only be applying the packages using the built-in package management system.

By cks at 2011-08-12 12:31:15:

If the goal is to be able to recreate an identical machine to go with your current ones (or an identical duplicate of a specific machine) then yes, you do have to update existing machines when you change the install checklist. And of course you also have to update the checklist when you change the current machines.

This doesn't scale if you have lots of machines or if you reinstall them regularly. But if you have a relatively small number of machines and you reinstall them very rarely it works fine.

In our situation, the overhead of building OS packages is not justified. We install a handful of programs on a handful of machines and do so very infrequently; we would spend more time learning how to do packaging and doing so than it takes to do the process by hand.

(By the way, of course we only do this for things that are not already available as packages.)

From 195.26.247.141 at 2011-08-15 03:34:16:

I find version control and checkouts to somewhere like /opt/<app> useful for avoiding making packages, but this is still painful if you have lots of very different machines (very different OS and package versions, or different architectures).

Written on 12 August 2011.
« Friendly 'noreply' email addresses
A Gnome 3 shell extensions failure »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 12 00:10:24 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.