A rule of thumb: Automate where you can make mistakes

April 30, 2010

One of my sysadmin rules of thumb for deciding what to automate in scripts and programs is this: automate where you can make mistakes. In particular, automate where you are specifying redundant information.

This will make more sense with an example, so let's talk about configuring iSCSI targets. We use static target configuration, which means that you tell the system a target name and the IP address that it can be found on; each iSCSI server has two IPs, so we configure each of its targets twice, once for each IP address.

(An iSCSI server machine can have several targets, each of which has several LUNs. In our environment, each iSCSI target represents a single physical disk, with the target names divided into a per-host and a per-disk portion.)

There's an obvious redundancy here; we know for sure that server A's targets are never going to be found on any IPs besides those that belong to server A. But when we specify this redundant information by hand, we allow errors to creep in; we could accidentally configure one of A's targets with an IP address for server B. (And indeed we did this once, and it turned out to be surprisingly difficult to sort out.)

So we automated the process of adding iSCSI targets to make sure that we couldn't make this mistake, or other related ones (failing to configure each target for both IP addresses or failing to configure all of a server's targets). Our program for this now just takes the server's name; from this it can determine all of the server's targets and both IP addresses for the server, and add all 24 separate static target entries for us.

Such automation is clearly not general; instead it relies on very specific knowledge of how we always set up our systems. But since we do have strong conventions about how we set up iSCSI targets, we might as well exploit them (in a script) to avoid errors and to make our lives easier.

(And if we ever have to break our conventions, well, we can always use the underlying system commands directly. Escape hatches are important too.)

Written on 30 April 2010.
« Never kill the screen locker
A brief history of fsck »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Apr 30 22:48:22 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.