2009-07-15
A Bourne shell gotcha with ( ... ) command grouping
Here is a mistake that I spent part of today discovering that I'd made.
Consider the following Bourne shell script fragment:
(
for i in $SOMETHING; do
if ! some-command $i; then
echo $0: failed on $i 1>&2
exit 1
fi
done
) | sort | ....
Tragically, this shell script fragment is broken. The exit is not
doing what you think it is doing.
(If it actually is doing what you think it is doing, you need to stop
being so clever in your Bourne shell scripts. Use 'break' instead,
so that people can understand you later.)
When I wrote this shell script, I clearly thought that this exit would
exit from the entire shell script, aborting it with a false status
so that various other things could notice that something had gone
wrong. But this is incorrect; commands in a ( ... ) command group
run in a separate context, so the exit just stopped the for loop,
exactly as if it was a break statement. The overall script continued
to run and indeed exited with a success status, despite things having
blown up.
(Since this involved a pipeline, the same thing would have happened if
I wrote the for loop without the ( ... ) around it. Although a bare
for loop is legal here, I habitually add the parentheses for clarity.)
For this particular script, I got around the problem by having the
failure case echo a magic marker into the for loop's output, and then
having the main portion of the script look for the magic marker. You
could also do something like capture standard error in a file and check
in the main portion to make sure that the file was empty.
(I don't like capturing stderr in scripts if I can help it, so I go out of my way to avoid it.)