Wandering Thoughts archives

2011-01-06

An appreciation for the Posix $( ) command substitution syntax

The Posix standards for Unix sometimes get a bad rap (there are some crazy things in there, and some not so great ones), especially for the bits that they invented, but every so often I think that they did something absolutely right. The new Posix $(...) syntax for command substitution is one shining example of this; it is superior to the original Bourne shell syntax of `...` in so many ways. Here's some of the reasons that I like it.

First off, $(...) nests in a straightforward way without needing special quoting. All by itself this is a huge advantage; I've written before that any time you have multiple levels of quoting and processing going on, things go to hell. Nested $(...)'s let you avoid all of that; you don't need to quote anything, you just nest them and then read them outside in (or inside out). You can easily extract an inner $(...) to run it standalone to make sure that it returns the right result, and so on.

Next, $(...) is (in my opinion) easier to read and scan, because the edges of the command substitution are much easier to spot. ( and ) are much more visually distinctive than `, and run very little risk of being confused with other characters (unlike ` with ' and "). It's also clear whether you're dealing with the start or the end of a command substitution, unlike the Bourne shell case where a stray ` might be doing either and you need to carefully watch the surrounding context.

Finally, it's easier to see and remember that $(...) will expand inside "..."-quoted strings, because it looks just like other $ expansions. With traditional Bourne shell you had to remember that both $ and `...` expanded in that context.

(This cuts both ways when writing shell scripts; I find that I'm much more likely to use $() inside double quotes than I ever was with `...`, simply because it's easier to remember that it's possible.)

PS: before I started writing this entry, I had the incorrect impression that $(...) originated in bash. Bash is probably still the most common Bourne shell variant to find it supported (and I see that Solaris sh doesn't support it, not that this surprises me any more), but there are a number of others such as dash and various versions of Kenneth Almquist's sh implementation (originally called ash).

PosixCommandSubstitution written at 00:24:16; Add Comment

2011-01-02

Why there is a gulf between shells and scripting languages

Recently I saw a stackoverflow question on why scripting languages aren't suitable as Unix shell scripting languages. My answer is that shells are strongly optimized for a different use case than programming languages, and this has significant effects on the design of the languages that they use and their semantics. Above all, shells are optimized for invoking external programs; a successful shell has ruthlessly pruned away everything that makes this awkward. Scripting languages, like other languages, are instead generally optimized for writing expressions, statements, and other internal language features.

The most visible result of this gulf is how shells and scripting languages treat unquoted words in their input. In a scripting language, unquoted words are generally identifiers (variables) and literal text must be quoted (I'm aware that Perl is a little different here); in a shell, unquoted words are literal text and identifiers must be called out explicitly. This makes perfect sense for both sides. In shells, most input is going to be command names and arguments for them (both literal text), and in scripting languages, most input is expressions and other statements using variables, functions, and so on. Each side has optimized their syntax to make their common case easy.

Because they are focused on running commands, shells directly expose operators to manipulate the results of running commands (including a wide variety of dataflow operators, as noted in a response to the stackoverflow question). The equivalent in programming languages is their rich vocabulary for writing expressions and accessing data. Since there are only so many special characters to go around, it's quite difficult to support both sorts of operators at once with convenient syntax.

(Bash tries, but notice that it has to use a special escape sequence to get into expression writing mode. Now, imagine writing a substantial program where every expression or assignment had to be written inside a '$[[ .... ]]' stanza; you'd be very angry with the designer of that language in short order.)

You can do better than current shells for shell scripts; I outlined some ideas for this back in What makes a good Unix glue language. But I think that it is intrinsic in the gulf that a shell is going to be excessively verbose for writing programs and a scripting language is going to be excessively verbose for running commands. You can't get both at once.

Sidebar: more differences in practice

You also write different sorts of programs between the two sorts of languages. Regardless of the syntax involved (and some languages have nearly syntax-free function invocation), there is also a deep semantic difference between calling a function and running an external command; functions are far more integrated into the rest of the program than an external command can be. Even in a purely functional language they can take and return much richer data structures than you can do with an external command. The result is that shell scripting is built around dataflow between external programs, and scripting languages are built around data structure manipulations in functions.

You can do both at once if you try hard, but you have to build bridges back and forth and my opinion is that it is not really a natural way to work.

ShellsVsScriptingLanguages written at 02:12:25; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.