Wandering Thoughts archives

2024-04-25

The importance of an ordinary space in a Unix shell command line

In the sidebar to yesterday's entry I (originally) made a Unix command line mistake by unthinkingly leaving out an ordinary, innocent looking space (it's corrected in the current version of the entry after it was noted by Emilio in a comment). This innocent looking mistake and its consequences are an illustration of something in Unix shell command lines, although I'm not sure of just what, so I'm going to write it up.

The story starts with the general arguments of Bash's 'read' builtin:

read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name …]

The 'read' builtin follows the general standard behavior of Unix commands where '-d delim' and other options that take an argument can be shortened to omit the space, so '-ddelim'. So you can write, for example:

echo "a:b:c" | while IFS= read -r -d':' l; do echo "$l"; done

Bash also has a special feature for -d. Normally the first character of delim is taken as the 'line' terminator, but if delim is blank, read will terminate the line when it reads a NUL character (0 byte), which is just what you want to handle the output of, for example, 'find ... -print0'.

The way you create an empty string argument in a Bash command line is to use an empty pair of quotes:

read -r -d '' line

So when I was writing the original command line in yesterday's entry, I absently mashed these two things together in my mind and wrote:

read -r -d'' line

I've used '' to create an empty argument and then I've done the standard thing of removing the space between -d and its argument. So clearly I've given '-d' an empty argument, right? Nope.

In Bash and other conventional shells, '' is nothingness. It only means an argument that is an empty string if it occurs on its own; this is a special interpretation added by the shell, and programs don't actually see the ''s. If you put a '' next to other non-whitespace characters, it disappears in the command line that the program will see. So writing -d'' was the same as writing -d with no argument, and the command line as 'read' would see it was actually:

read -r -d line

Which would have caused 'read' to use 'l' as the line terminator.

In the process of writing this entry, I realized that there's a more interesting way to make what is fundamentally the same mistake, although it goes deeper into Unix arcana and doesn't look half as obvious. In many modern shells, the Bourne shell included, you can write a NUL character (0 byte) as $'\0'. So you will see people write a 'read with NUL terminated lines' command line as:

IFS= read -r -d $'\0' line

This works fine, and unlike the '' case we obviously have a real argument here, not just an empty argument, so clearly we can shorten this to:

IFS= read -r -d$'\0' line

If you try this you will discover it doesn't work. The fundamental problem is that Unix command line arguments can't include NUL characters, because the Unix command line API passes the arguments as an array of NUL-terminate (C) strings. No matter how you invoke a program, the first NUL character in an argument is the end of that argument from the program's perspective. So although it looked very different as typed, from read's perspective what we did was the same as:

IFS= read -r -d line

(And then it would have the same effect as my mistake.)

PS: This is a little tangled because 'read' is a Bash builtin so in theory Bash doesn't have to stick to the limits of the kernel API, but in practice I think Bash does do so.

unix/ShellImportanceOfASpace written at 23:17:18;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.