Bash is letting locales destroy shell scripting (at least on Linux)
Here, let me present you something in illustrated form, on a system
where /bin/sh
is Bash:
$ cat Demo #!/bin/sh for i in "$@"; do case "$i" in *[A-Z]*) echo "$i has upper case";; esac done $ env - LANG=en_US.UTF-8 ./Demo a b C y z b has upper case C has upper case y has upper case z has upper case $ env - LANG=en_US.UTF-8 /bin/dash ./Demo a b C y z C has upper case $ env - ./Demo a b C y z # no locale C has upper case $
I challenge you to make sense of either part of Bash's behavior in the en_US.UTF-8 locale.
(Contrary to my initial tweet, this behavior has apparently been in Bash for some time. It's also somewhat system dependent; Bash 4.2.25 on Ubuntu 12.04 behaves this way but 4.2.45 on FreeBSD doesn't.)
There is no two ways to describe this behavior: this is braindamaged.
It is at best robot logic on Bash's part to allow [A-Z]
to match
lower case characters. It is also terribly destructive to bash's
utility for shell scripting. If I cannot even count on glob
operations that are not even in a file context operating sanely,
why am I using bash to write shell scripts at all? On many systems,
this means eschewing '#!/bin/sh
' entirely because (as we're seeing
here) /bin/sh
can be Bash and Bash will behave this way even when
invoked as sh
.
(I have to assume that not matching a
as upper case is a Bash
bug but that the rest of the behavior is intended. It makes more
sense than the other way around.)
What Bash has done here is to strew land mines in the way of my
scripts working right in what is now a common environment. If I
want to continue using shell scripts I have to start trying to
defensively defeat Bash. What will do it? Today, probably setting
LC_COLLATE=C
or better yet LC_ALL=C
. In all of my scripts.
I might as well switch to Python or Perl even for small things;
they are clearly less likely to cause me heartburn in the future
by going crazy.
There's another problem with this behavior, which is that it is not
what any other POSIX-compatible shell I could find does (on Ubuntu
14.04). Dash (the normal /bin/sh
on many Linuxes), mksh, ksh, and
even zsh don't match here. This means that having Bash as /bin/sh
creates a serious behavior difference, not just adds non-POSIX
features that you may accidentally (or deliberately) use in
'#!/bin/sh
' scripts.
(Yes, yes, I've written about this before.
But the examples back then were vaguely sensible things for locales
to apply to. What is happening in the Demo
script is very, very
far over the line. What is next, GNU grep
deciding that your
'[A-Z]
' should match case-independently in some locales? That's
just as justified as what Bash is doing here.)
PS: This is actually making me rethink the idea of having /bin/sh
be Bash on our Ubuntu machines, which is the case for historical
reasons. The pain of rooting out
bashism from our scripts may be less than the pain of dealing with
Bash braindamage.
Sidebar: the bug continues
If you change the [A-Z]
to [a-z]
and try Demo
with all upper
case letters, it will match A-Y but think Z
doesn't match. This
is symmetrical in what you could consider a weird way. A quick test
suggests that all other letters besides 'a
' (in the [A-Z] case)
and 'Z' (in the [a-z] case) match 'correctly', if we assume that a
case independent match is correct in the first place.
Because I was masochistic tonight this has been filed as GNU Bash bug 108609 (tested against bash git tip), although savannah.gnu.org may have eaten the actual text I put in (it sent the text to me in email but I can't read the text through the web). My bug is primarily to report the missing 'a' and 'Z' and only lightly touches on the whole craziness of [A-Z] matching any lower case characters at all, so I encourage other people to file their own bugs about that. I have opted for a low-stress approach myself since I don't expect my bug report to go anywhere.
|
|