Why I hate $LANG and locales on Unix

September 28, 2006
; cat foo
be+c 1
be+f 2
bed  3
; look bed foo
bed  3
; sort -o foo foo
; look bed foo || echo failed
failed
; cat foo
be+c 1
bed  3
be+f 2
; echo $LANG
en_US.UTF-8

Locales are intrinsically a user interface issue; you want to present information to the user in their specific local format. The Unix $LANG approach is intrinsically because commands have no idea whether they are presenting information to the user, or to other commands; either way they choose, they cannot win.

But the ways that things lose are different in each option. If programs ignore the locale, they present information to the user in a somewhat less desirable format. But the current locale approach actually breaks things on Unix, as this example neatly shows.

The locale approach is superficially attractive but deeply harmful for Unix systems, because it does fundamental damage to the idea that programs can be used as internal building blocks in bigger things. There once was a day when sort was a useful component; as demonstrated, that day is now effectively over.

Sidebar: how to work around this

The GNU sort documentation has scary warnings, so it appears that you need to set both LANG and LC_ALL to 'C', just to be sure. (It looks like you can't leave them unset unless you unset all of the LC_* environment variables, but setting these overrides the others.)


Comments on this page:

By Dan.Astoorian at 2006-09-28 18:06:57:

My interpretation is that this is simply a bug: look is not using the same collating sequence as sort because it's not honouring the locale, and it should.

For what it's worth, GNU sort's documentation notwithstanding, I believe it is sufficient (and potentially necessary) to set LC_COLLATE: the priority is:

LC_COLLATE > LC_ALL > LANG

(i.e., if LC_COLLATE is set to en_US.UTF-8, setting LC_COLLATE and LC_ALL to "C" still won't get you the right sort order.)

By Dan.Astoorian at 2006-09-28 18:13:46:

I should really test things more carefully before making public statements...

Contrary to my intuition, but in accordance with the standard, LC_COLLATE does not override LC_ALL; the opposite is true. I'm sure there must be a good reason for this.

By cks at 2006-09-28 19:09:02:

My example is a condensed version of a real situation; in the real version, the binary search was being done inside the main program, in C. This is the corrosive problem with locales: they contaminate everything that they touch. If your program uses a locale-using program, you have to be locale-using too, and it doesn't matter if the information you are dealing with will never come near being shown to a user.

Effectively with locales, everything locale-aware is assuming that their output will be shown to the user. If you want to use it in some other way, you have to 'undo' their assumption (and then repeat it in your own output, as a good locale aware program, so that the next layer gets to do it all over again).

And if someone didn't get the memo that the world changed, kaboom.

By cks at 2006-09-28 19:25:49:

Also, I think Dan's environment locale environment variable mistake perfectly illustrates what is wrong with the whole locale environment variable mess: there are too many of them for people to keep track of which one does what when, making the whole lot of them effectively useless.

(For example, I have no idea if LANG overrules LC_ALL or the other way around. You might say that this is no problem, but I worry about what happens if some system 'helpfully' sets LC_ALL for me on login or the like and I've set LANG, or vice versa.)

By Clément at 2019-02-24 09:33:13:

Can you clarify what's wrong with the sort example? I've tried it locally, and it works fine (the second 'look' succeeds too).

By cks at 2019-02-25 13:38:56:

If you get the post-sort order visible here, your 'look' command is doing something unusual (perhaps it's finally locale aware in your environment). I still get the results visible here on several Linux machines, although it appears not to happen on FreeBSD 10.4.

(FreeBSD's sort says that it respects locales and I think I'm running it in a way that should use the right locale, so either FreeBSD has slightly different locale rules or their sort is interpreting things differently than GNU sort.)

Written on 28 September 2006.
« How not to set up your DNS (part 12)
Gnome daemons you'll want to run in a custom environment »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Thu Sep 28 12:54:05 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.