2024-04-25
The importance of an ordinary space in a Unix shell command line
In the sidebar to yesterday's entry I (originally) made a Unix command line mistake by unthinkingly leaving out an ordinary, innocent looking space (it's corrected in the current version of the entry after it was noted by Emilio in a comment). This innocent looking mistake and its consequences are an illustration of something in Unix shell command lines, although I'm not sure of just what, so I'm going to write it up.
The story starts with the general arguments of Bash's 'read
'
builtin:
read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name …]
The 'read
' builtin follows the general standard behavior of Unix
commands where '-d delim' and other options that take an argument
can be shortened to omit the space, so '-ddelim'. So you can
write, for example:
echo "a:b:c" | while IFS= read -r -d':' l; do echo "$l"; done
Bash also has a special feature for -d. Normally the first character
of delim is taken as the 'line' terminator, but if delim is
blank, read
will terminate the line when it reads a NUL character
(0 byte), which is just what you want to handle the output of, for
example, 'find ... -print0'.
The way you create an empty string argument in a Bash command line is to use an empty pair of quotes:
read -r -d '' line
So when I was writing the original command line in yesterday's entry, I absently mashed these two things together in my mind and wrote:
read -r -d'' line
I've used '' to create an empty argument and then I've done the standard thing of removing the space between -d and its argument. So clearly I've given '-d' an empty argument, right? Nope.
In Bash and other conventional shells, '' is nothingness. It only
means an argument that is an empty string if it occurs on its own;
this is a special interpretation added by the shell, and programs
don't actually see the ''s. If you put a '' next to other non-whitespace
characters, it disappears in the command line that the program will
see. So writing -d''
was the same as writing -d
with no argument,
and the command line as 'read
' would see it was actually:
read -r -d line
Which would have caused 'read
' to use 'l' as the line terminator.
In the process of writing this entry, I realized that there's a more
interesting way to make what is fundamentally the same mistake, although
it goes deeper into Unix arcana and doesn't look half as obvious.
In many modern shells, the Bourne shell included, you can write a
NUL character (0 byte) as $'\0'
. So you will see people write
a 'read with NUL terminated lines' command line as:
IFS= read -r -d $'\0' line
This works fine, and unlike the '' case we obviously have a real argument here, not just an empty argument, so clearly we can shorten this to:
IFS= read -r -d$'\0' line
If you try this you will discover it doesn't work. The fundamental
problem is that Unix command line arguments can't include NUL
characters, because the Unix command line API passes the arguments
as an array of NUL-terminate (C) strings. No matter how you invoke a
program, the first NUL character in an argument is the end of that
argument from the program's perspective. So although it looked very
different as typed, from read
's perspective what we did was the
same as:
IFS= read -r -d line
(And then it would have the same effect as my mistake.)
PS: This is a little tangled because 'read
' is a Bash builtin so
in theory Bash doesn't have to stick to the limits of the kernel
API, but in practice I think Bash does do so.