Unnoticed nonportability in Bourne shell code (and elsewhere)
In response to my entry on how Bashisms in
#!/bin/sh scripts aren't
necessarily bugs, FiL wrote:
If you gonna use bashism in your script why don't you make it clear in the header specifying #!/bin/bash instead [of] #!/bin/sh? [...]
One of the historical hard problems for Unix portability is people writing non-portable code without realizing it, and Bourne shell code is no exception. This is true for even well intentioned people writing code that they want to be portable.
One problem, perhaps the root problem, is that very little you do on Unix will come with explicit (non-)portability warnings and you almost never have to go out of your way to use non-portable features. This makes it very hard to know whether or not you're actually writing portable code without trying to run it on multiple environments. The other problem is that it's often both hard to remember and hard to discover what is non-portable versus what is portable. Bourne shell programming is an especially good example of both issues (partly because Bourne shell scripts often use a lot of external commands), but there have been plenty of others in Unix's past (including 'all the world's a VAX' and all sorts of 64-bit portability issues in C code).
So one answer to FiL's question is that a lot of people are using
bashisms in their scripts without realizing it, just as a lot of
people have historically written non-portable Unix C code without
intending to. They think they're writing portable Bourne shell scripts,
but because their
/bin/sh is Bash and nothing in Bash warns about
things the issues sail right by. Then one day you wind up changing
/bin/sh to be Dash and all sorts of bits of the world explode,
sometimes in really obscure ways.
All of this sounds abstract, so let me give you two examples of
accidentally Bashisms I've committed. The first and probably quite
common one is using '
==' instead of '
=' in '
[ ... ]' conditions.
Many other languages use
== as their string equality check, so at some
point I slipped and started using it in 'Bourne' shell scripts. Nothing
complained, everything worked, and I thought my shell scripts were fine.
The second I just discovered today. Bourne shell pattern matching allows
character classes, using the usual '
[...]' notation, and it even has
negated characters classes. This means that you can write something like
the following to see if an argument has any non-number characters in it:
case "$arg" in *[^0-9]*) echo contains non-number; exit 1;; esac
Actually I lied in that code. Official POSIX Bourne shell doesn't
negate character classes with the usual '
^' character that Unix
regular expressions use; instead it uses '
!'. But Bash accepts
^' as well. So I wrote code that used '
^', tested it, had it
work, and again didn't realize that I was non-portable.
(Since having a '
^' in your character class is not an error in
a POSIX Bourne shell, the failure mode for this one is not a
This is also a good example of how hard it is to test for
non-portability, because even when you use '
set -o posix' Bash
still accepts and matches this character class in its way (with
^' interpreted as class negation). The only way to test or find
this non-portability is to run the script under a different shell
entirely. In fact, the more theoretically POSIX compatible shells
you test on the better.
(In theory you could try to have a perfect memory for what is POSIX compliant and not need any testing at all, or cross-check absolutely everything against POSIX and never make a mistake. In practice humans can't do that any more than they can write or check perfect code all the time.)