How I (once) did change management with scripts

March 28, 2012

When I read Philip Hollenback's latest entry and it mentioned someone doing (change/system) management through shell scripts (instead of, say, Puppet), my first thought was 'hey, I've done that'. So I might as well write up how I did it, either for someone to use or in case people want to marvel at the crazy person.

(Now, a disclaimer: by now this was more than half a decade ago, and some of my memories of the fine details have undoubtedly faded (ie, are now wrong).)

The basic environment this happened in was a lab environment with (at its height) on the order of a hundred essentially identical PC machines running Linux (this is the same environment where we needed efficient update distribution). Most of the system management was handled through packages and automatic package updates, but every so often there was something that was best handled in a shell script.

Each separate change was a separate little shell script, all of which lived in a common directory (actually one directory for each OS release). Script filenames started with a sequence number (eg they had names like '01-fix-something'), and scripts were run in sequence. The driver system kept track of which scripts had already succeeded and did not re-run them; a script that exited with a failed status would be retried the next time the driver system ran. The driver system ran once a day or (I believe) immediately after system boot, and processed scripts after applying package updates. Scripts were expected to check if they were applicable before doing anything and exit if they weren't (with status 0 if they were definitely not applicable to this system or with status 1 if they should be retried the next time).

(If I was doing this again I think I would make the driver script not run further scripts if an earlier one failed. In our case all of the scripts were basically independent, so it didn't matter.)

There was no mechanism to rerun a script if it changed; if I changed a script and wanted to have it rerun, I needed to give it a new sequence number. If a script became unnecessary for some reason, it was just removed.

All of this is actually quite short and simple to implement, and it worked quite well within its modest goals. It was not particularly difficult to write scripts, they were automatically executed for you, all machines were kept in sync, and a newly (re)installed machine would automatically pick up all of the current customizations. These days, you would put the entire directory of scripts into a VCS (and you might distribute it by having the workstations check out a copy from the central repo).

Comments on this page:

From at 2012-04-02 14:38:09:

Very similar to how I do mine, except that each of my scripts are run with either "apply", "revert", "status", or "desc" arguments. The "desc" just printed out a one sentence description. The "status" argument runs a function that tests whether the change has been applied, and returns 0 if yes, 1 if no or >1 for error. The "apply" function would make the change, all the while writing data to an undo file. The "revert", as you may have guessed, reads the undo file and accurately rolls back the changes, then deletes the undo file if it was successful. All the scripts actually source several functions that have been abstrated into a script library that I wrote for this purpose. Some features are that the undo files operate like a stack -- If a script has been "apply"ed already and you re-apply it, an additional, new undo file is written. The revert function always uses the most recent undo file, thereby popping the changes from the stack when reverting.

Written on 28 March 2012.
« Ultimately, abuse issues have to be handled by humans
Scalable system management is based on principles »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 28 23:08:46 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.