Some things that make languages easy (or not) to embed in Unix shell scripts

May 21, 2022

Part of Unix shell scripting is that Unix has a number of little languages (and interpreters for them) that are commonly embedded in shell scripts to do various things. Shell scripts aren't just written in the Bourne shell; they're effectively written in the Bourne shell plus things like sed and awk, and later more things like Perl (the little language used by jq may in time become routine). However, not all languages become used on Unix this way, even if they're interpreted and otherwise used for shell script like things. Recently it occurred to me that one factor in this is how embeddable the language is in a shell script.

If you're putting together a shell script, your life is a lot easier if the shell script is self-contained and doesn't need any additional files distributed with it (files that it will probably have to know where to find). If you're going to use an additional little language in your shell script, you really want to be able to provide the program in the little language as part of the shell script. Interpreters and languages can make this more or less easy, in two ways.

First and obviously, the interpreter mostly needs to accept a program as a command line argument, not require it to be in a file that the interpreter reads (and most especially not require the file to have a specific extension). There is a way to embed file contents in shell scripts but it will make your shell script's life harder. For many people this will probably push them to shipping the program in a separate file, which in turn will probably push them using a more shell script embedding friendly language.

It's convenient but not essential if the interpreter accepts multiple snippets of program as separate command line arguments. The poster child for this is sed, where you can supply multiple lines of program with multiple -e arguments. Lack of this isn't fatal, as shown by awk, especially if even snippets of the overall program are probably going to be multiple lines in themselves.

Generally, the only practical way to quote a long, multi-line command line argument in the Bourne shell is with single quotes (' .... '); quoting with double quotes ("...") can be done, but you will have heartburn with all sorts of characters. This makes it quite important that a language to be embedded use single quotes as little as possible. If you can't naturally write a program without using single quotes, you'll have problems providing the program as an embedded command line argument in the shell script. If your language wants you to use all of single quotes, double quotes, backslashes, and dollar signs ('$'), you're really going to have heartburn.

(It also helps if your language isn't picky about formatting and indentation, and lets you squeeze a bunch of statements onto a single physical line.)

There is a way to deal with languages that aren't friendly to shell quoting; you can use a here document to create a shell script variable and then supply the environment variable as the program when you invoke the interpreter. For example:

pyprog="$(cat <<'EOF'
python -c "$pyprog" ...

However, this is more awkward than doing the equivalent in awk. This awkwardness acts as friction that pushes people away from using such awkward languages in shell scripts. If they do use them, it's more natural to put the program in a separate file and ship the shell script and the separate file (which will go into some known location, and so on).

Written on 21 May 2022.
« Getting a Bourne shell "here document" into a shell variable
Modern (public) TLS has only a limited number of intermediate certificates »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 21 21:39:23 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.