The roots of an obscure Bourne shell error message

September 10, 2023

Suppose that you're writing Bourne shell code that involves using some commands in a subshell to capture some information into a shell variable, 'AVAR=$(....)', but you accidentally write it with a space after the '='. Then you will get something like this:

$ AVAR= $(... | wc -l)
sh: 107: command not found

So, why is this an error at all, and why do we get this weird and obscure error message? In the traditional Unix and Bourne shell way, this arises from a series of decisions that were each sensible in isolation.

To start with, we can set shell variables and their grown up friends environment variables with 'AVAR=value' (note the lack of spaces). You can erase the value of a shell variable (but not unset it) by leaving the value out, 'AVAR='. Let's illustrate:

$ export FRED=value
$ printenv | fgrep FRED
FRED=value
$ FRED=
$ printenv | fgrep FRED
FRED=
$ unset FRED
$ printenv | fgrep FRED
$ # ie, no output from printenv

Long ago, the Bourne shell recognized that you might want to only temporarily set the value of an environment variable for a single command. It was decided that this was a common enough thing that there should be a special syntax for it:

$ PATH=/special/bin:$PATH FRED=value acommand

This runs 'acommand' with $PATH changed and $FRED set to a value, without changing (or setting) either of them for anything else. We have now armed one side of our obscure error, because if we write 'AVAR= ....' (with the space), the Bourne shell will assume that we're temporarily erasing the value of $AVAR (or setting it to a blank value) for a single command.

The second part is that the Bourne shell allows commands to be run to be named through indirection, instead of having to be written out directly and literally. In Bourne shell, you can do this:

$ cmd=echo; $cmd hello world
hello world
$ cmd="echo hi there"; $cmd
hi there

The Bourne shell doesn't restrict this indirection to direct expansion of environment variables; any and all expansion operations can be used to generate the command to be run and some or all of its arguments. This includes subshell expansion, which is written either as $(...) in the modern way or as `...` in the old way (those are backticks, which may be hard to see in some fonts). Doing this even for '$(...)' is reasonably sensible, probably sometimes useful, and definitely avoids making $(...) a special case here.

So now we have our perfect storm. If you write 'AVAR= $(....)', the Bourne shell first sees 'AVAR= ' (with the space) and interprets it as you running some command with $AVAR set to a blank value. Then it takes the '$(...)' and uses it to generate the command to run (and its command line). When your subshell prints out its results, for example the number of lines reported by 'wc -l', the Bourne shell will try to use that as a command and fail, resulting in our weird and obscure error message. What you've accidentally written is similar to:

$ cmd=$(... | wc -l)
$ AVAR= $cmd

(Assuming that the $(...) subshell doesn't do anything different based on $AVAR, which it probably doesn't.)

It's hard to see any simple change in the Bourne shell that could avoid this error, because each of the individual parts are sensible in isolation. It's only when they combine together like this that a simple mistake compounds into a weird error message.

(The good news is that shellcheck warns about both parts of this, in SC1007 and SC2091.)

Written on 10 September 2023.
« The effects of modest TCP latency (I think) on my experience with some X programs
GNU Emacs, use-package, and key binding for mode specific keymaps »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 10 22:12:44 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.