The concept of error distance in sysadmin commands
I have recently started thinking about the concept of what I will call the 'error distance' of sysadmin commands: how much do you have to change a perfectly normal command in order to do something undesirable or disastrous (instead of just failing with an error)?
(As an example, consider the ZFS command to expand a ZFS pool with
a new pair of mirrored disks, which is '
zpool add POOL mirror DEV1
DEV2'. If you accidentally omit the '
mirror', you will add two
unmirrored disks to the ZFS pool, and you can't shrink ZFS pools to
remove devices. So the error distance here is one omitted word.)
You want the error distance for commands to be as large as possible, because this avoids accidents when people make their inevitable errors. Low error distance is also more dangerous in commonly used commands than uncommonly used ones, because you are less likely to carefully check a command that you use routinely (especially if you don't consider it inherently dangerous).
When considering the error distance, my belief is that certain sorts of changes are more likely than others (and thus make the error distance closer). My gut says:
- omitting words is more likely than changing words (using 'cat' when
you mean 'dog'), which in turn is more likely than adding words.
(I am not sure where transposing words should fit in, where you write 'cat dog' instead of 'dog cat'.)
- commonly used things are more likely than uncommon things; for example, if you commonly add an option to one command, you are more likely to add it to another command.
(I suspect that this has been studied formally at some point, probably by the HCI/Human Factors people.)