Unnoticed nonportability in Bourne shell code (and elsewhere)

October 29, 2014

In response to my entry on how Bashisms in #!/bin/sh scripts aren't necessarily bugs, FiL wrote:

If you gonna use bashism in your script why don't you make it clear in the header specifying #!/bin/bash instead [of] #!/bin/sh? [...]

One of the historical hard problems for Unix portability is people writing non-portable code without realizing it, and Bourne shell code is no exception. This is true for even well intentioned people writing code that they want to be portable.

One problem, perhaps the root problem, is that very little you do on Unix will come with explicit (non-)portability warnings and you almost never have to go out of your way to use non-portable features. This makes it very hard to know whether or not you're actually writing portable code without trying to run it on multiple environments. The other problem is that it's often both hard to remember and hard to discover what is non-portable versus what is portable. Bourne shell programming is an especially good example of both issues (partly because Bourne shell scripts often use a lot of external commands), but there have been plenty of others in Unix's past (including 'all the world's a VAX' and all sorts of 64-bit portability issues in C code).

So one answer to FiL's question is that a lot of people are using bashisms in their scripts without realizing it, just as a lot of people have historically written non-portable Unix C code without intending to. They think they're writing portable Bourne shell scripts, but because their /bin/sh is Bash and nothing in Bash warns about things the issues sail right by. Then one day you wind up changing /bin/sh to be Dash and all sorts of bits of the world explode, sometimes in really obscure ways.

All of this sounds abstract, so let me give you two examples of accidentally Bashisms I've committed. The first and probably quite common one is using '==' instead of '=' in '[ ... ]' conditions. Many other languages use == as their string equality check, so at some point I slipped and started using it in 'Bourne' shell scripts. Nothing complained, everything worked, and I thought my shell scripts were fine.

The second I just discovered today. Bourne shell pattern matching allows character classes, using the usual '[...]' notation, and it even has negated characters classes. This means that you can write something like the following to see if an argument has any non-number characters in it:

case "$arg" in
   *[^0-9]*) echo contains non-number; exit 1;;
esac

Actually I lied in that code. Official POSIX Bourne shell doesn't negate character classes with the usual '^' character that Unix regular expressions use; instead it uses '!'. But Bash accepts '^' as well. So I wrote code that used '^', tested it, had it work, and again didn't realize that I was non-portable.

(Since having a '^' in your character class is not an error in a POSIX Bourne shell, the failure mode for this one is not a straightforward error.)

This is also a good example of how hard it is to test for non-portability, because even when you use 'set -o posix' Bash still accepts and matches this character class in its way (with '^' interpreted as class negation). The only way to test or find this non-portability is to run the script under a different shell entirely. In fact, the more theoretically POSIX compatible shells you test on the better.

(In theory you could try to have a perfect memory for what is POSIX compliant and not need any testing at all, or cross-check absolutely everything against POSIX and never make a mistake. In practice humans can't do that any more than they can write or check perfect code all the time.)


Comments on this page:

Occasionally I write shell scripts which are deployed across Debian/OpenBSD/FreeBSD.

The only proven way I've found of being sure the script does what I intend it to do, is to have virtual machines with a base install of each operating system.

Then I'll fire up an interactive shell and I manually try out whatever is going into my script.

It's a little tedious, but I believe it's well worth the time. Better than having to chase some obscure bug well after I've moved on to other things.

This development process also flags any GNU/BSD differences re. external commands such as expr, cut, etc.

By adrien at 2016-09-20 16:22:35:

You should look at http://www.shellcheck.net/ . It's a static analyzer for shell scripts which supports different shells (and follows the shebang by default). Took me a while to start using it but now when it warns about something I triple-check before saying the code is actually safe: it's really good.

Written on 29 October 2014.
« My current somewhat tangled feelings on operator.attrgetter
Quick notes on the Linux iptables 'ipset' extension »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Oct 29 00:43:47 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.