Why I hate $LANG
and locales on Unix
; cat foo be+c 1 be+f 2 bed 3 ; look bed foo bed 3 ; sort -o foo foo ; look bed foo || echo failed failed ; cat foo be+c 1 bed 3 be+f 2 ; echo $LANG en_US.UTF-8
Locales are intrinsically a user interface issue; you want to present
information to the user in their specific local format. The Unix $LANG
approach is intrinsically because commands have no idea whether they are
presenting information to the user, or to other commands; either way
they choose, they cannot win.
But the ways that things lose are different in each option. If programs ignore the locale, they present information to the user in a somewhat less desirable format. But the current locale approach actually breaks things on Unix, as this example neatly shows.
The locale approach is superficially attractive but deeply harmful
for Unix systems, because it does fundamental damage to the idea that
programs can be used as internal building blocks in bigger things. There
once was a day when sort
was a useful component; as demonstrated, that
day is now effectively over.
Sidebar: how to work around this
The GNU sort
documentation has scary warnings, so it appears that you
need to set both LANG
and LC_ALL
to 'C
', just to be sure.
(It looks like you can't leave them unset unless you unset all of the
LC_
* environment variables, but setting these overrides the others.)
|
|