The concept of error distance in sysadmin commands

August 25, 2008

I have recently started thinking about the concept of what I will call the 'error distance' of sysadmin commands: how much do you have to change a perfectly normal command in order to do something undesirable or disastrous (instead of just failing with an error)?

(As an example, consider the ZFS command to expand a ZFS pool with a new pair of mirrored disks, which is 'zpool add POOL mirror DEV1 DEV2'. If you accidentally omit the 'mirror', you will add two unmirrored disks to the ZFS pool, and you can't shrink ZFS pools to remove devices. So the error distance here is one omitted word.)

You want the error distance for commands to be as large as possible, because this avoids accidents when people make their inevitable errors. Low error distance is also more dangerous in commonly used commands than uncommonly used ones, because you are less likely to carefully check a command that you use routinely (especially if you don't consider it inherently dangerous).

When considering the error distance, my belief is that certain sorts of changes are more likely than others (and thus make the error distance closer). My gut says:

  • omitting words is more likely than changing words (using 'cat' when you mean 'dog'), which in turn is more likely than adding words.

    (I am not sure where transposing words should fit in, where you write 'cat dog' instead of 'dog cat'.)

  • commonly used things are more likely than uncommon things; for example, if you commonly add an option to one command, you are more likely to add it to another command.

(I suspect that this has been studied formally at some point, probably by the HCI/Human Factors people.)


Comments on this page:

From 71.120.102.36 at 2008-08-25 01:17:03:

rm has an error distance of either a space ('rm *.bak' vs. 'rm * .bak'), or, worse, a Tab (with filename completion: 'rm xxx<Tab> *' - if there's only one matching 'xxx' in the directory, shell's completion adds a space, and here comes the whack).

Lev

From 71.65.56.124 at 2008-08-25 07:50:12:

RE: Transposing commands

For the longest time, when I was still a fledgling Linux user, I would transpose the arguments to the 'ln' command. I don't know why. I finally stopped doing it when I realized that they were in the exact same order as the 'cp' command.

My favorite shortcut to disaster is rsync. For anyone who might not know, the syntax is "rsync [options] source destination". The issue arises with the source, and whether or not you add a trailing /. Here's why:

Suppose you want to rsync user directories from one machine to another. You might say

rsync -e ssh -raugvz root@machine1:/home root@machine2:/home

thinking it would make machine1 and machine2 have an identical /home. What you have actually done is taken the /home directory from machine1 and copied it into /home on machine2. In other words, machine2 now has /home/home, which in all likelihood has doubled the amount of space taken up on the partition AND not synced the information anyway. The correct command was:

rsync -e ssh -raugvz root@machine1:/home/ root@machine2:/home

See the difference? The trailing slash on the source means "copy the contents of this directory to the target", not "recreate this directory structure in the target"

Not that I've ever done this, mind you ;-)

Matt Simmons
http://standalone-sysadmin.blogspot.com

By cks at 2008-08-25 22:08:39:

I suspect that rm is less prone to mistakes than its raw error distance would suspect because people already feel nervous around it, so there's a natural inclination to be extra-careful anyways. (And people use ways to make rm less dangerous, such as 'rm -i'.)

From 71.120.102.36 at 2008-08-26 00:43:58:

True, people would be more careful with rm. But, on the other hand, a typical sysadmin (let alone - a typical user) performs rm a lot more often than they performm a ZFS pool addition ;-). So here comes the law or large numbers. My sorry but serves an example - I personally have had a a Tab-completion accident after more than a dozen years of shell experience ;-).

Lev

From 74.74.184.197 at 2008-10-01 00:03:32:

an interesting property... you've chosen an apt name for it :]

a (default? [1]) zsh(1) installation adds a few feet to rm(1) 's error distance by first confirming any wildcard-ed invocations [2]

-tyler http://unsyncopated.com

[1] grep for "RM_STAR_{SILENT,WAIT}" at http://man.sourcentral.org/ubuntu804/1+zshoptions

[2] http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2005-01/0614.html

ps: this homegrown wiki syntax is an interesting animal!

I've often thought about this when using the mv command. For example, when moving two files into a directory, there's a brief moment where hitting return would be a disaster: renaming one file over the other. The command must pass through an error state before reaching the intended state. GNU Coreutils mv has a -t switch that allows listing the destination directory first, avoiding this intermediate danger command, and I've used it sometimes just for this reason.

Written on 25 August 2008.
« An update to the ZFS excessive prefetching situation
Fixing low command error distances »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 25 00:36:04 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.