GNU sort and -k: a gotcha

November 5, 2010

Sometimes I don't like GNU utilities.

Suppose that you have a file that looks like this:

oxygen     fred
nitrogen   bob
xenon      fred
carbon     cks
iron       jim

Further suppose that you want to sort it by the second field, to group machines by the users that own them. So of course you do 'sort -k2 file', because that's the obvious answer. Except that it doesn't work; it sorts in some peculiar, non-obvious way, and it's not that you need to specify field 1 or field 3 or anything like that. Perhaps you scratch your head, grind your teeth, and move on. (That's what I did until recently.)

Congratulations, you've been hit by a GNU sort gotcha; sort doesn't define fields the way you think it does. Pretty much every other sensible Unix program that deals with multi-field lines says that fields are separated by one or more whitespace characters. GNU sort, just to be different, says that fields are not so much separated by whitespace characters but created by whitespace characters and the whitespace characters become part of the next field.

(This is spelled out in the info document for GNU sort in the section on the -t argument. Read it carefully.)

This works out the way you innocently expect if each line separates fields with the same number of whitespace characters, or if you are using -n even with a variable number of whitespace characters (at least in my testing). It goes off the rails badly in cases like this example, where fields are separated by a variable number of whitespace characters.

The solution is to add the -b argument, which makes GNU sort work the way you expect it to. I am tempted to make an alias (well, a cover script) that always supplies -b, because I can't think of any situation where I don't want this sane behavior.

(GNU sort's behavior is in fact in violation of the Single Unix Specification for sort; see the description of the -t option.)

Written on 05 November 2010.
« What we (currently) use virtualization for
Modern versions of Apache and Redirect »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Nov 5 00:54:33 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.