Why there is a gulf between shells and scripting languages
January 2, 2011
Recently I saw a stackoverflow question on why scripting languages aren't suitable as Unix shell scripting languages. My answer is that shells are strongly optimized for a different use case than programming languages, and this has significant effects on the design of the languages that they use and their semantics. Above all, shells are optimized for invoking external programs; a successful shell has ruthlessly pruned away everything that makes this awkward. Scripting languages, like other languages, are instead generally optimized for writing expressions, statements, and other internal language features.
The most visible result of this gulf is how shells and scripting languages treat unquoted words in their input. In a scripting language, unquoted words are generally identifiers (variables) and literal text must be quoted (I'm aware that Perl is a little different here); in a shell, unquoted words are literal text and identifiers must be called out explicitly. This makes perfect sense for both sides. In shells, most input is going to be command names and arguments for them (both literal text), and in scripting languages, most input is expressions and other statements using variables, functions, and so on. Each side has optimized their syntax to make their common case easy.
Because they are focused on running commands, shells directly expose operators to manipulate the results of running commands (including a wide variety of dataflow operators, as noted in a response to the stackoverflow question). The equivalent in programming languages is their rich vocabulary for writing expressions and accessing data. Since there are only so many special characters to go around, it's quite difficult to support both sorts of operators at once with convenient syntax.
(Bash tries, but notice that it has to use a special escape sequence to
get into expression writing mode. Now, imagine writing a substantial
program where every expression or assignment had to be written inside a
You can do better than current shells for shell scripts; I outlined some ideas for this back in What makes a good Unix glue language. But I think that it is intrinsic in the gulf that a shell is going to be excessively verbose for writing programs and a scripting language is going to be excessively verbose for running commands. You can't get both at once.
Sidebar: more differences in practice
You also write different sorts of programs between the two sorts of languages. Regardless of the syntax involved (and some languages have nearly syntax-free function invocation), there is also a deep semantic difference between calling a function and running an external command; functions are far more integrated into the rest of the program than an external command can be. Even in a purely functional language they can take and return much richer data structures than you can do with an external command. The result is that shell scripting is built around dataflow between external programs, and scripting languages are built around data structure manipulations in functions.
You can do both at once if you try hard, but you have to build bridges back and forth and my opinion is that it is not really a natural way to work.
Written on 02 January 2011.
* * *