Wandering Thoughts archives


(Ab)using awk on the fly

Suppose that you have a file with lines of the form:

host1:   package1 package2 package3
host2:   package3 package5
ahost3:  package1 package2 package3

You want to transform this into something that looks like:

package1 package2 package3
   host1 ahost3
package3 package5

In other words, aggregate together all of the hosts with a common set of packages (in this case, packages to update that require manual work).

One of the problems of modern Unix is that there are simply too many programs that do random chunks of text processing for anyone except a specialist to remember or even know them all and know what they do. Thus it's quite possible that there any number of clever ways to do this with relatively standard and widely available GNU or other tools. I just don't know what they are off the top of my head and it is much faster to use tools that I know, even in brute force ways, than to go searching and searching and maybe not find anything.

So here is how I did this, on the fly, using tools that I'm already familiar with (which primarily means awk). Let's assume the file is pkglist:

sort -b -k2 pkglist | sed 's/: */:/' |
  awk -F: '$2 == last {sum = sum " " $1}
           $2 != last && last {printf "%s\n\t%s\n", last, sum}
           $2 != last {sum = $1; last = $2}
           END {printf "%s\n\t%s\n", last, sum}'

(The actual version I used put all of this on one line, because a nice clean multiline thing isn't the kind of thing you do on the fly; it's what you do when you're cleaning it up to write about.)

The 'sort -b' bit is due to a GNU sort gotcha.

(Yes, I really write this sort of complex thing on the fly.)

unix/AbusingAwkOnTheFly written at 15:40:09; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.