2016-03-09
A sensible surprise (to me) in the Bourne shell's expansion of "$@"
I generally like to think that I'm pretty well up on the odd corners of the Bourne shell due to having around Unix for a fair while. Every so often I stumble over something that shows me that I'm wrong.
So let's start with the following, taken from something Jed Davis discovered about Bash:
$ set -- one two three $ for i in "front $@ back"; do echo $i; done front one two three back $
When I saw this, my first reaction was basically 'what?', because it didn't seem to make any sense. After I mumbled a bit on Twitter, Jed Davis found the explanation in the Single Unix Specification here:
When the expansion occurs within double-quotes, and where field splitting [...] is performed, each positional parameter shall expand as a separate field, with the provision that the expansion of the first parameter shall still be joined with the beginning part of the original word (assuming that the expanded parameter was embedded within a word), and the expansion of the last parameter shall still be joined with the last part of the original word.
The purpose of "$@" is to preserve arguments that originally have
spaces in them as single arguments. So, for example:
$ set -- "one argument" "two argument" $ for i in "$@"; do echo $i; done one argument two argument $ for i in "$*"; do echo $i; done one argument two argument $
This is what the first part of the SuS specification describes (up
to 'shall expand as a separate field'). But this definition opens
up a question; what is result of expansion if you have not a simple
"$@" but instead something with additional text inside the double
quotes? One answer would be to completely turn off the special
splitting and argument preserving behavior of "$@" (making it
identical to "$*" here), but that probably wouldn't be very
satisfying. Traditional Unix and thus SuS instead says that you
should continue field splitting but pretend that any front text is
attached to the first argument and any back text is attached to the
last one.
(Since it's still text inside a "...", the front and rear text
is not subject to any word splitting; it's attached untouched as
a single unit.)
When I saw this, my first and not well thought out expectation was that any leading and trailing text would be subject to regular word splitting and thus be taken as separate, additional arguments. Of course this doesn't actually make sense if I think about it for real, because there is normally no word splitting inside double quotes. Thus, the traditional Unix and SuS behavior is perfectly reasonable here and makes sense from an algorithmic perspective.
Given all this, the result of the following is not really surprising:
$ set -- one two three $ for i in "$@ $@"; do echo $i; done one two three one two three $
(Writing this entry has been useful in forcing me to confront some
of my own fuzzy thinking around the whole area of "$@", as you
can tell from the story of my first reaction to this.)
2016-03-07
Why it makes sense for true and false to ignore their arguments
It's standard when writing Unix command line programs to make them check their arguments and complain if the usage is incorrect. It's reasonably common to do this even for programs that don't take options or positional arguments. After all, if your command is supposed to take no arguments, it's really an error if someone runs it and gives it arguments.
(Not all scripts, programs, and so on actually check this, because you usually have to go at least a little bit out of your way to look at the argument count. But it's the kind of minor nit you might get code review comments about, or an issue report.)
true and false are an exception to this, in that they more or
less completely ignore any arguments given to them. Part of this
behavior is historical; the V7 /bin/true and /bin/false were
extremely minimal, and when you're being minimal it's easiest to
not even look at the arguments. But beyond the history, I think
that this is perfectly sensible behavior for true and false
because it makes them universal substitutes for other commands,
for when you want to null out a command so that it does nothing.
Want to make a command do nothing but always succeed? Simple: 'mv
command command.real; ln -s /bin/true command'. Want to do the
same thing but have the command always fail? Use false instead
of true. Sure, you can do the same thing with shell scripts that
deliberately ignore the arguments and just do 'exit 0' or 'exit
1', but this is a little bit simpler and matches the historical
behavior.
(You can also do this in shell scripts as a way of creating a 'don't actually do anything' mode, but there are probably better patterns there.)
On that note, it's interesting to note that although GNU true and
false have command line options that will cause them to produce
output, there is no way to get them to return the wrong exit status.
And while they respond to --help and --version, they silently
ignore other options (as opposed to, say, reporting a syntax error).
(This entry was sparked by Zev Weiss's mention of true in his
comment on this entry.)
Sidebar: true and false in V7
In V7 Unix, true is an empty file and false is a file that is
literally just 'exit 1'. Neither has a #! line at the start of
the file, because that came in later. That
true is empty instead of 'exit 0' saves V7 a disk block, which
probably mattered back then.