Bash is letting locales destroy shell scripting (at least on Linux)

July 2, 2014

Here, let me present you something in illustrated form, on a system where /bin/sh is Bash:

$ cat Demo
#!/bin/sh
for i in "$@"; do
  case "$i" in
    *[A-Z]*) echo "$i has upper case";;
  esac
done
$ env - LANG=en_US.UTF-8 ./Demo a b C y z
b has upper case
C has upper case
y has upper case
z has upper case
$ env - LANG=en_US.UTF-8 /bin/dash ./Demo a b C y z
C has upper case
$ env - ./Demo a b C y z # no locale
C has upper case
$

I challenge you to make sense of either part of Bash's behavior in the en_US.UTF-8 locale.

(Contrary to my initial tweet, this behavior has apparently been in Bash for some time. It's also somewhat system dependent; Bash 4.2.25 on Ubuntu 12.04 behaves this way but 4.2.45 on FreeBSD doesn't.)

There is no two ways to describe this behavior: this is braindamaged. It is at best robot logic on Bash's part to allow [A-Z] to match lower case characters. It is also terribly destructive to bash's utility for shell scripting. If I cannot even count on glob operations that are not even in a file context operating sanely, why am I using bash to write shell scripts at all? On many systems, this means eschewing '#!/bin/sh' entirely because (as we're seeing here) /bin/sh can be Bash and Bash will behave this way even when invoked as sh.

(I have to assume that not matching a as upper case is a Bash bug but that the rest of the behavior is intended. It makes more sense than the other way around.)

What Bash has done here is to strew land mines in the way of my scripts working right in what is now a common environment. If I want to continue using shell scripts I have to start trying to defensively defeat Bash. What will do it? Today, probably setting LC_COLLATE=C or better yet LC_ALL=C. In all of my scripts. I might as well switch to Python or Perl even for small things; they are clearly less likely to cause me heartburn in the future by going crazy.

There's another problem with this behavior, which is that it is not what any other POSIX-compatible shell I could find does (on Ubuntu 14.04). Dash (the normal /bin/sh on many Linuxes), mksh, ksh, and even zsh don't match here. This means that having Bash as /bin/sh creates a serious behavior difference, not just adds non-POSIX features that you may accidentally (or deliberately) use in '#!/bin/sh' scripts.

(Yes, yes, I've written about this before. But the examples back then were vaguely sensible things for locales to apply to. What is happening in the Demo script is very, very far over the line. What is next, GNU grep deciding that your '[A-Z]' should match case-independently in some locales? That's just as justified as what Bash is doing here.)

PS: This is actually making me rethink the idea of having /bin/sh be Bash on our Ubuntu machines, which is the case for historical reasons. The pain of rooting out bashism from our scripts may be less than the pain of dealing with Bash braindamage.

Sidebar: the bug continues

If you change the [A-Z] to [a-z] and try Demo with all upper case letters, it will match A-Y but think Z doesn't match. This is symmetrical in what you could consider a weird way. A quick test suggests that all other letters besides 'a' (in the [A-Z] case) and 'Z' (in the [a-z] case) match 'correctly', if we assume that a case independent match is correct in the first place.

Because I was masochistic tonight this has been filed as GNU Bash bug 108609 (tested against bash git tip), although savannah.gnu.org may have eaten the actual text I put in (it sent the text to me in email but I can't read the text through the web). My bug is primarily to report the missing 'a' and 'Z' and only lightly touches on the whole craziness of [A-Z] matching any lower case characters at all, so I encourage other people to file their own bugs about that. I have opted for a low-stress approach myself since I don't expect my bug report to go anywhere.

Written on 02 July 2014.
« Why Solaris's SMF is not a good init system
An interesting Go concurrency bug that I inflicted on myself »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 2 23:02:44 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.