2010-11-05
GNU sort and -k
: a gotcha
Sometimes I don't like GNU utilities.
Suppose that you have a file that looks like this:
oxygen fred nitrogen bob xenon fred carbon cks iron jim
Further suppose that you want to sort it by the second field, to group
machines by the users that own them. So of course you do 'sort -k2
file
', because that's the obvious answer. Except that it doesn't work;
it sorts in some peculiar, non-obvious way, and it's not that you need
to specify field 1 or field 3 or anything like that. Perhaps you scratch
your head, grind your teeth, and move on. (That's what I did until
recently.)
Congratulations, you've been hit by a GNU sort gotcha; sort doesn't define fields the way you think it does. Pretty much every other sensible Unix program that deals with multi-field lines says that fields are separated by one or more whitespace characters. GNU sort, just to be different, says that fields are not so much separated by whitespace characters but created by whitespace characters and the whitespace characters become part of the next field.
(This is spelled out in the info document for GNU sort in the section
on the -t
argument. Read it carefully.)
This works out the way you innocently expect if each line separates
fields with the same number of whitespace characters, or if you are
using -n
even with a variable number of whitespace characters (at
least in my testing). It goes off the rails badly in cases like this
example, where fields are separated by a variable number of whitespace
characters.
The solution is to add the -b
argument, which makes GNU sort work
the way you expect it to. I am tempted to make an alias (well, a
cover script) that always supplies -b
, because I can't think of
any situation where I don't want this sane behavior.
(GNU sort's behavior is in fact in violation
of the Single Unix Specification for sort;
see the description of the -t
option.)