(Ab)using awk on the fly

February 22, 2013

Suppose that you have a file with lines of the form:

host1:   package1 package2 package3
host2:   package3 package5
ahost3:  package1 package2 package3

You want to transform this into something that looks like:

package1 package2 package3
   host1 ahost3
package3 package5

In other words, aggregate together all of the hosts with a common set of packages (in this case, packages to update that require manual work).

One of the problems of modern Unix is that there are simply too many programs that do random chunks of text processing for anyone except a specialist to remember or even know them all and know what they do. Thus it's quite possible that there any number of clever ways to do this with relatively standard and widely available GNU or other tools. I just don't know what they are off the top of my head and it is much faster to use tools that I know, even in brute force ways, than to go searching and searching and maybe not find anything.

So here is how I did this, on the fly, using tools that I'm already familiar with (which primarily means awk). Let's assume the file is pkglist:

sort -b -k2 pkglist | sed 's/: */:/' |
  awk -F: '$2 == last {sum = sum " " $1}
           $2 != last && last {printf "%s\n\t%s\n", last, sum}
           $2 != last {sum = $1; last = $2}
           END {printf "%s\n\t%s\n", last, sum}'

(The actual version I used put all of this on one line, because a nice clean multiline thing isn't the kind of thing you do on the fly; it's what you do when you're cleaning it up to write about.)

The 'sort -b' bit is due to a GNU sort gotcha.

(Yes, I really write this sort of complex thing on the fly.)

Written on 22 February 2013.
« Go: using type assertions to safely reach through interface types
What limits the number of concurrent connections to a server »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 22 15:40:09 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.