Wandering Thoughts archives

2010-08-22

Another reason to hate $LANG and locales on Unix

Sometimes I'm slow; only recently did it occur to me how the $LANG sort misfeature and GNU comm's misfeature combine in an orgy of annoyance in a heterogenous environment.

Suppose that you have systems that changed their default locale between operating system versions. As part of routine processing, you use comm to get the difference between something on the local system and a global list. Well, oops. Even if you carefully use sort on both versions, you are going to have problems.

As we saw earlier, the choice of locale may change the sort order. While GNU comm is locale aware in just the same way as sort, it is not aware of multiple locales; it assumes that all files are sorted in the current locale (and these days it actively requires it). So your global file, although sorted, may not be sorted in the current system's locale, which will cause comm both to complain and to fail.

(You get the same effect if you generate different global files on different machines and then try to process them together.)

Effectively this means that there is no such thing as a globally visible file that is properly sorted, because what 'properly sorted' is is different on different machines. Instead you probably want to sort all files on the local machine, which means making copies of the global ones. Ideally you want to do this right before using them, because the locale may differ between various environments even on a single machine; it simply safer to sort files in the script immediately before feeding them to comm, so you know that sort and comm were both running in the same locale.

(Offhand, there are at least four plausible environments where system scripts might run with a different locale: from init.d scripts at boot time, from crontab entries, from an interactive login, and from an automated ssh command invocation that passes along the other machine's locale.)

sysadmin/LANGHateII written at 00:17:56; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.