The importance of an ordinary space in a Unix shell command line

April 25, 2024

In the sidebar to yesterday's entry I (originally) made a Unix command line mistake by unthinkingly leaving out an ordinary, innocent looking space (it's corrected in the current version of the entry after it was noted by Emilio in a comment). This innocent looking mistake and its consequences are an illustration of something in Unix shell command lines, although I'm not sure of just what, so I'm going to write it up.

The story starts with the general arguments of Bash's 'read' builtin:

read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name …]

The 'read' builtin follows the general standard behavior of Unix commands where '-d delim' and other options that take an argument can be shortened to omit the space, so '-ddelim'. So you can write, for example:

echo "a:b:c" | while IFS= read -r -d':' l; do echo "$l"; done

Bash also has a special feature for -d. Normally the first character of delim is taken as the 'line' terminator, but if delim is blank, read will terminate the line when it reads a NUL character (0 byte), which is just what you want to handle the output of, for example, 'find ... -print0'.

The way you create an empty string argument in a Bash command line is to use an empty pair of quotes:

read -r -d '' line

So when I was writing the original command line in yesterday's entry, I absently mashed these two things together in my mind and wrote:

read -r -d'' line

I've used '' to create an empty argument and then I've done the standard thing of removing the space between -d and its argument. So clearly I've given '-d' an empty argument, right? Nope.

In Bash and other conventional shells, '' is nothingness. It only means an argument that is an empty string if it occurs on its own; this is a special interpretation added by the shell, and programs don't actually see the ''s. If you put a '' next to other non-whitespace characters, it disappears in the command line that the program will see. So writing -d'' was the same as writing -d with no argument, and the command line as 'read' would see it was actually:

read -r -d line

Which would have caused 'read' to use 'l' as the line terminator.

In the process of writing this entry, I realized that there's a more interesting way to make what is fundamentally the same mistake, although it goes deeper into Unix arcana and doesn't look half as obvious. In many modern shells, the Bourne shell included, you can write a NUL character (0 byte) as $'\0'. So you will see people write a 'read with NUL terminated lines' command line as:

IFS= read -r -d $'\0' line

This works fine, and unlike the '' case we obviously have a real argument here, not just an empty argument, so clearly we can shorten this to:

IFS= read -r -d$'\0' line

If you try this you will discover it doesn't work. The fundamental problem is that Unix command line arguments can't include NUL characters, because the Unix command line API passes the arguments as an array of NUL-terminate (C) strings. No matter how you invoke a program, the first NUL character in an argument is the end of that argument from the program's perspective. So although it looked very different as typed, from read's perspective what we did was the same as:

IFS= read -r -d line

(And then it would have the same effect as my mistake.)

PS: This is a little tangled because 'read' is a Bash builtin so in theory Bash doesn't have to stick to the limits of the kernel API, but in practice I think Bash does do so.


Comments on this page:

Yikes! shellcheck does not complain about it!

Also, the reason why, if possible, I always put the space between option and argument.

I think $' is a bashism: it isn’t in POSIX nor in OpenBSD’s ksh, but it seems to have been added to FreeBSD’s version of the Almquist shell.

By Emilio at 2024-04-26 11:37:12:

This innocent looking mistake and its consequences are an illustration of something in Unix shell command lines, although I'm not sure of just what

The manner in which they're parsed and passed to programs.

In shell quoting, one can go "in and out" of quotes arbitrarily within one word; for example, a'b'cd"e" is one word: abcde. (I've occasionally found this useful, like when I need literal backslashes or double-quotes in one part of a word and variable-expansion or apostrophes in another.) By the same logic, -d'' and -d are the same. A zero-length word, however, can only be encoded by quoting it; -d '' is two words, the second being blank. Incidentally, the hyphen isn't special in any way, except maybe to the utility reading its argument list.

As you note, arguments are passed to programs as null-terminated strings (POSIX-2018 volume 1 §3.92 "Character String", but never mind 3.87's reference to the "Portable Character Set"; all systems I'm aware of pass byte-strings unchanged, up to the first '\0'). So, an embedded '\0' can't be present in an argument of an external program—or an environment variable, for similar reasons. I'm doubtful that bash supports such strings at all; for example, if I run x=$'a\0b' and then echo ${#x} to print the length of x, I get 1, not 3. Your -d $'\0' example probably isn't working how you think it is.

"read" isn't actually a "program" per se; it couldn't work as one, because it needs to set environment variables in its "parent process"—the shell—and there's no API to do that. Oddly, POSIX classifies it as a normal utility, rather than grouping it with the "Special Built-In Utilities"; but it does say "The read utility shall conform to XBD Utility Syntax Guidelines", and the bash developers evidently chose not to special-case anything. That's probably a good choice; to do otherwise would've just added to the confusion. Strictly speaking, though, the behaviour is not a "consequence" of argument-passing or null-terminated strings, since neither is required in this context.

By root at 2024-04-28 01:14:11:

i sometimes write "-d " to remind myself that it is all one argument. but i agree, sh(1) sucks, exec() sucks, UNIX sucks.

Written on 25 April 2024.
« Pruning some things out with (GNU) find options
I wish projects would reliably use their release announcements mechanisms »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu Apr 25 23:17:18 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.