Forcing sort ordering in Unix shell scripts

October 11, 2008

There are a number of situations in Unix shell scripts where you know the order that the lines in your output should be in, but you can't produce it in that order and nothing in your normal output is useful to sort on. One common case is awk scripts that accumulate information into arrays and then dump it out in their END blocks with 'for (key in array) ...'; array keys are, of course, produced in no particular order.

In this situation, some people go to bat against sort with complex key specifications and the like. I'm lazy, so my usual solution is simple: I add an extra field at the start of the line that has a useful key for sort, sort the output, and then run the output through a second awk script to remove the first field (and often to do the final neat formatting of the output).

A variant of this trick can be used to re-order output lines that are easiest to produce together. A snippet from one of my shell script is perhaps the best illustration:

.... |
(while read fs typ opts; do
     echo 0 unshare $fs
     echo 2 share -F $typ -o $opts $fs
 echo 1 sleep 60) |
sort -n | sed 's/^[012] //' | ...

What this script wants to do is unshare a bunch of filesystems, wait some time, and then reshare them all. However, it gets all of the necessary information about each filesystem once, in a big chunk, so it is most natural to generate both the unshare and the reshare command at the same time. To insert a sleep in the middle we add a hidden field and just pick entirely artificial keys for all the lines such that sort will put all of the unshares first, the sleep second, and then all of the shares third.

(Disclaimer: this is clearly at least related to Perl's Schwartzian transforms, although I think that they're not quite the same thing, and I probably picked up this shell idiom from somewhere.)

Comments on this page:

From at 2008-10-11 11:47:32:

It’s more a Guttman-Rosler transform than a Schwartzian transform. :-)

Aristotle Pagaltzis

Written on 11 October 2008.
« Some notes about iSCSI multipathing in Solaris
An irritating awk limitation: getting a range of fields »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Oct 11 02:03:39 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.