Wandering Thoughts archives

2017-07-07

Programming Bourne shell scripts is tricky, with dim corners

There have been a bunch of good comments on my entry about my views on Shellcheck, so I want to say just a bit too much to fit in a comment of my own. I'll mostly be talking about this script, where I thought the unquoted '$1' was harmless:

#!/bin/sh
echo Error: $1

Leah Neukirchen immediately pointed out something I had completely forgotten, which is that unquoted Bourne shell variable expansion also does globbing. The following two lines give you the same output:

echo *
G="*"; echo $G

This is surprising (to me), but after looking at things I'll admit that it's also useful. It gives you a straightforward way to dynamically construct glob patterns in your shell script and then expand them, without having to resort to hacks that may result in too much expansion or interpretation of special characters.

Then Vidar, the author of Shellcheck, left a comment with an interesting PS suggesting some things that my unquoted use of $1 was leaving me open to:

./testit '-e \x1b[33mTest' # Ascii input results in binary output (when sh is bash)
./testit '-n foo'          # -n disappears

This is a nice illustration of how tricky shell programming can be, because these probably don't happen but I can't say that they definitely don't happen in all Unix environments (and maybe no one can). As bugs, both of these rely on the shell splitting $1 into multiple arguments to the actual command and then echo interpreting the first word (now split into a separate argument) as a -n or -e option, changing its behavior. However, I deliberately wrote testit's use of echo so that this shouldn't happen, as $1 is only used after a non-argument option (the Error: portion).

With almost all commands in a traditional Unix, the first regular argument turns off all further option processing; everything after it will be considered an argument, no matter if it could be a valid option. Using an explicit '--' separator is only necessary if you want your first regular argument to be something that would otherwise be interpreted as an option. However, at least some modern commands on some Unixes have started accepting options anywhere on the command line, not just up to the first regular argument. If echo behaves this way, Vidar's examples do malfunction, with the -n and -e seen as actual options by echo. Having echo behave this way in your shell is probably not POSIX compatible, but am I totally sure that no Unix will ever do this? Of course not; Unixes have done some crazy shell-related things before.

Finally, Aristotle Pagaltzis mentioned, about his reflexive quoting of Bourne shell variables when he uses them:

I’m just too aware that uninvited control and meta characters happen and that word splitting is very complex semantically. [...]

This is very true, as I hope this entry helps illustrate. But for me at least there are three situations in my shell scripts. If I'm processing unrestricted input in a non-friendly environment, yes, absolutely, I had better put all variable usage in quotes for safety, because sooner or later something is going to go wrong. Generally I do and if I haven't, I'd actually like something to tell me about it (and Shellcheck would be producing a useful message here for such scripts).

(At the same time, truly safe defensive programming in the Bourne shell is surprisingly hard. Whitespace and glob characters are the easy case; newlines often cause much more heartburn, partly because of how other commands may react to them.)

If I'm writing a script for a friendly environment (for example, I'm the only person who'll probably run it) and it doesn't have to handle arbitrary input, well, I'm lazy. If the only proper way to run my script is with well-formed arguments that don't have whitespace in them, the only question is how the script is going to fail; is it going to give an explicit complaint, or is it just going to produce weird messages or errors? For instance, perhaps the only proper arguments to a script are the names of filesystems or login names, neither of which have whitespace or funny characters in them in our environment.

Finally, sometimes the code in my semi-casual script is running in a context where I know for sure that something doesn't have whitespace or other problem characters. The usual way for this to happen is for the value to come from a source that cannot (in our environment) contain such values. For a hypothetical example, consider shell code like this:

login=$(awk -F: '$3 == NNN {print $1}' /etc/passwd | sed 1q)
....
echo whatever $login whatever

This is never going to have problematic characters in $login (for a suitable value of 'never', since in theory our /etc/passwd could be terribly corrupted or there could be a RAM glitch, and yes, if I was going to (say) rm files as root based on this, $login would be quoted just in case).

This last issue points out one of the hard challenges of a Bourne shell linter that wants to only complain about probable or possible errors. To do a good job, you want to recognize as many of these 'cannot be an error' situations as possible, and that requires some fairly sophisticated understanding not just of shell scripting but of what output other commands can produce and how data flows through the script.

By the way, Shellcheck impressed me by doing some of this sort of analysis. For example, it doesn't complain about the following script:

#!/bin/sh
ADIR=/some/directory/path
#ADIR="$1"
if [ ! -d $ADIR ]; then
   echo does not exist: $ADIR
fi

(If you uncomment the line that sets ADIR from $1, Shellcheck does report problems.)

programming/BourneShellTrickyAndDim written at 00:00:22; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.