Why I hate $LANG and locales on Unix

September 28, 2006
; cat foo
be+c 1
be+f 2
bed  3
; look bed foo
bed  3
; sort -o foo foo
; look bed foo || echo failed
failed
; cat foo
be+c 1
bed  3
be+f 2
; echo $LANG
en_US.UTF-8

Locales are intrinsically a user interface issue; you want to present information to the user in their specific local format. The Unix $LANG approach is intrinsically because commands have no idea whether they are presenting information to the user, or to other commands; either way they choose, they cannot win.

But the ways that things lose are different in each option. If programs ignore the locale, they present information to the user in a somewhat less desirable format. But the current locale approach actually breaks things on Unix, as this example neatly shows.

The locale approach is superficially attractive but deeply harmful for Unix systems, because it does fundamental damage to the idea that programs can be used as internal building blocks in bigger things. There once was a day when sort was a useful component; as demonstrated, that day is now effectively over.

Sidebar: how to work around this

The GNU sort documentation has scary warnings, so it appears that you need to set both LANG and LC_ALL to 'C', just to be sure. (It looks like you can't leave them unset unless you unset all of the LC_* environment variables, but setting these overrides the others.)

Written on 28 September 2006.
« How not to set up your DNS (part 12)
Gnome daemons you'll want to run in a custom environment »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Sep 28 12:54:05 2006
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.