2009-07-15
A Bourne shell gotcha with ( ... ) command grouping
Here is a mistake that I spent part of today discovering that I'd made.
Consider the following Bourne shell script fragment:
(
for i in $SOMETHING; do
if ! some-command $i; then
echo $0: failed on $i 1>&2
exit 1
fi
done
) | sort | ....
Tragically, this shell script fragment is broken. The exit is not
doing what you think it is doing.
(If it actually is doing what you think it is doing, you need to stop
being so clever in your Bourne shell scripts. Use 'break' instead,
so that people can understand you later.)
When I wrote this shell script, I clearly thought that this exit would
exit from the entire shell script, aborting it with a false status
so that various other things could notice that something had gone
wrong. But this is incorrect; commands in a ( ... ) command group
run in a separate context, so the exit just stopped the for loop,
exactly as if it was a break statement. The overall script continued
to run and indeed exited with a success status, despite things having
blown up.
(Since this involved a pipeline, the same thing would have happened if
I wrote the for loop without the ( ... ) around it. Although a bare
for loop is legal here, I habitually add the parentheses for clarity.)
For this particular script, I got around the problem by having the
failure case echo a magic marker into the for loop's output, and then
having the main portion of the script look for the magic marker. You
could also do something like capture standard error in a file and check
in the main portion to make sure that the file was empty.
(I don't like capturing stderr in scripts if I can help it, so I go out of my way to avoid it.)
2009-07-13
Shell scripts should not use absolute paths for programs
There is a certain habit in shell scripts of referring to uncommon
programs by their absolute path; for example, if you need to run lsof,
people will write '/usr/sbin/lsof ....' in their shell script. We
do a certain amount of that here, and then recently one of our shell
scripts started reporting:
netwatch: line 15: /usr/sbin/lsof: No such file or directory
You see, you shouldn't do this, because every so often Unix vendors change where they put commands (or, in multi-vendor environments, two vendors disagree about it). If you used hard-coded paths, your script just broke.
(In this case, Ubuntu 6.06 put lsof in /usr/sbin and Ubuntu 8.04
moved it to /usr/bin, probably on the sensible grounds that it's
useful for ordinary users too.)
The right way to do this is to add the directory you expect the command
to be in to your script's $PATH and then just invoke the command
without the absolute path. If the command gets moved, well, hopefully it
will be to somewhere else on $PATH (as it was in this case), and your
script will happily keep working. Better yet, this way your script can
work transparently across different Unix environments without having to
be changed or conditionalized; just add all of the possible directories
to your script's $PATH and be done with it.
(This does point out that the Bourne shell could do with a simple way of
adding something to your $PATH if it isn't already there.)
2009-07-04
A side note on the cost of operations
A side note on my previous entry on the cost of operation versus the costs of development:
It is popular in some quarters to characterize the growing realization of the (potential) costs of operation as sloppy developers finally having to grow up and live in the real world (where real men program in C and are proud of it), instead of getting to paper over their sloppyness with Moore's Law and endless hardware budgets.
This is not merely a mistake, it is wrong.
There is a rule in optimization: you optimize where the program spends its time, not where it doesn't. We can rephrase this to 'optimize what matters', and then observe that a significant part of development is figuring out what matters and what doesn't. You can never optimize everything, not on any real program, because you never have enough time (programs are not finished, they are released), so you must pick and choose.
When Moore's Law was handing people 'free' performance increases every year, performance that was less than ideal was not something that mattered. Well, generally; there were always environments that operated at such scale or with such thin margins that such cost of operation issues really did matter. But they were rare (and when you have a rare need, you pay extra to have it met).
For much of the past decade or two, the truth (however unpleasant to people who hate 'wasteful' programs) was that optimizing for the cost of operation was in general a mistake and something that a good rational programmer would avoid. Writing (not too) inefficient code in a 'sloppy' high level language was the right choice; writing highly efficient code in C or assembler, just because, was the wrong one.
(And that things may be changing now does not change that; it just means that you should make different development decisions now.)