The Bourne shell and Bash aren't the right languages for larger programs

May 14, 2021

In my recent entry on DKMS, I said some negative things about it being an almost 4,000 line long Bash script. In comments, a couple of people questioned this; for instance, Arnaud Gomes asked:

I seem to recall a post of yours a few years ago about the main difference between the shell and python being that a shell program is basically just glue between external commands. Isn't it what DKMS is?

(Arnaud Gomes is probably thinking of this entry on the gulf between shells and scripting language.)

There are three overlapping problems that almost always manifest in large shell scripts. The first problem is that shell scripts are more or less constrained to be all in a single file. 4,000 lines in a single file is hard to keep track of in any language; people do much better when we can chunk up complexity into smaller units.

The second is that the Bourne shell's oddities, limitations, and and outsourced language elements throw unnecessary obstacles in the way of expressing your program's logic. DKMS may run a lot of external programs, but as you can see from the manpage, it contains a lot of features and has a lot of complex logic to decide what to do to what. Pretty much any large shell script is going to contain a lot of logic, because there are very few situations where you spend hundreds or thousands of lines just running other programs and not doing much yourself. If you write this logic in shell scripts, you must express it within the inherent limitations of the shell and the result is not all that easy to follow, which makes it hard to maintain and expand over time.

(These days the Bourne shell does have arithmetic, at least. But figuring out how to use various random Unix programs to efficiently express and test parts of your logic is still a Turing tarpit.)

The third problem is that the Bourne shell lacks important language features that normally act to make coding errors less likely and contain and manage code complexity. The lack of these makes it harder and more error prone to express what you're doing, harder to keep track of what your code does, and contributes to making your logic harder to follow. Using Bash instead of plain Bourne shell fixes only some of these. One small and typical problem area is that the Bourne shell doesn't have named function arguments; this creates problems when reading the script and enables errors when writing it. A large problem area is that the shell has very limited data types, especially for function arguments. Plain Bourne shell has only strings (and the special list of arguments). Bash adds indexed arrays and 'associative arrays' (maps in Go, dicts in Python), but they can't be nested and passing them as function arguments is at best somewhat unnatural, which strongly limits their usefulness.

(Lacking data structures does a number of bad things, but one of them is that it makes it harder for shell scripts to gather data and keep track of things.)

PS: If you've never looked at a large shell script that's trying its best, it's worthwhile to read (or skim) part of the dkms script. It may be eye-opening about what doing large scale shell script programming forces you into.

Written on 14 May 2021.
« The Bourne shell lets you set variables in if expressions
The size of our Prometheus setup as of May 2021 »

Page tools: View Source.
Search:
Login: Password:

Last modified: Fri May 14 00:11:57 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.