Why quoting in the Bourne shell makes me grumpy

October 13, 2006

Imagine for a moment that you have an almost arbitrary string (no nulls, newlines, or other funky control characters) that you want to quote so that it passes through the Bourne shell intact. How do you do it?

The Bourne shell offers you two quoting schemes, strings in single quotes and strings in double quotes.

Strings in double quotes need special characters escaped with backslashes. There are five of them (quick, do you know them all?), or six in some situations in some versions of the Bash manpage. And of course, backslash is not a general escaping character; putting it in front of a non-special character is not harmless.

Strings in single quotes can't have anything escaped, which is OK, because nothing has special meaning inside them. Except a single quote. Since there is no way of escaping the single quote, you get to turn a single quote into five characters:

'"'"'

(Okay, I suppose you could turn it into '\'' instead and save yourself one character, at the expense of generating something that looks even more like you stuttered.)

(You may or may not be able to read that in your font, since a fair number of fonts are not so great at distinguishing single quotes, double quotes, and apostrophes. It's almost as bad as the great l vs 1 problem.)

This leads to a general view of mine: often, the more quoting methods you have the worse off you are. Unless they are very specialized, quantity serves mostly to confuse, annoy, and surprise; you are better off with one method, ideally one as simple as possible.

Sidebar: so how should shells do quoting?

I'm glad you asked. Tom Duff's rc answered this question many years ago:

A quoted word is a sequence of characters surrounded by single quotes ('). A single quote is represented in a quoted word by a pair of quotes ('').

That's it. As a bonus it's dirt simple to write code that quotes a string for rc: double any single quotes, then put the whole thing in single quotes.


Comments on this page:

From 70.231.136.160 at 2006-10-14 16:45:14:

If nothing has special meaning inside single quotes except single quotes, why in the world would '"'"' represent '? It seems like I see: initial single quote (omitted); initial double quote (passed through); second single quote (...?), etc. This would seem to indicate it should either become "'", or more likely, " and then a parsing error.

Is there some extra rule I'm missing, like 'the only special thing inside single-quoted strings are single quotes AND double-quoted strings'? If so, how do you put a " inside a single-quoted string? With \"? Wait, that means there's yet another character that's still special inside single-quoted strings.

-- nothings

By cks at 2006-10-14 22:12:13:

In hindsight I should have explained this more. The transformation is done to single quotes inside strings that you are quoting (with single quotes), so you wind up with things like:

$ echo 'a string with a '"'"' in it'
a string with a ' in it

The first single quote ends the string, then the "'" bit generates a quoted single quote, then the final single quote starts up the string again. (Bourne shell string merging rules then turn it into a single argument.)

Since \' also generates a quoted single quote, it could be used in place of the "'" sequence to save a character.

As a standalone sequence '"'"' is indeed a Bourne shell parsing error (an unterminated double-quoted string).

By DanielMartin at 2006-10-16 00:02:08:

What I appreciate so much about the Bourne shell's single-quoting is its absolute predictability. From the open single quote to the very next single quote, every single character - whether backslash, newline, or other generally meaningful character - gets literally copied into the value of the string. After working with trying to remember how exactly to quote or escape in the tenth stupid little language I'm working in today, it's a relief to get to the simplest quoting system on the planet: absolutely nothing quotes the next character. Having to write '\'' for a single quote seems like a small price to pay for that complete literal predictability. (though I'll admit that the rc method makes sense)

And it's miles better than what csh, whose quoting system no one can reliably describe to completion. For example, there's the whole brokenness entailed by the fact that this line doesn't generally print a single exclamation point:

echo '!'

(tcsh, to its credit, makes that line work)

As for the wonderfulness of rc's method, DCL (the command language on VMS) had a similar quoting mechanism. I seem to recall that this could very quickly lead to needing to put eight quotation marks in a row because of some other silliness of DCL that made it hard to re-quote something to pass it to another program.

By cks at 2006-10-16 13:38:05:

Situations with multilevel quoting are where you really, really want some form of quoting that nests. The classical example of this in Unix shells is not actually quoting as such, but command substitutions with backquotes.

Modern Bourne shell versions get around this by supporting a syntax that nests, $(...), in addition to the old-style `....`. (rc deals with the problem in the same way, using `{...} instead.)

Written on 13 October 2006.
« Link: Warning Signs for Tomorrow
Weekly spam summary on October 14, 2006 »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 13 23:27:55 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.