(Ab)using awk on the fly

February 22, 2013

Suppose that you have a file with lines of the form:

host1:   package1 package2 package3
host2:   package3 package5
ahost3:  package1 package2 package3

You want to transform this into something that looks like:

package1 package2 package3
   host1 ahost3
package3 package5
   host2

In other words, aggregate together all of the hosts with a common set of packages (in this case, packages to update that require manual work).

One of the problems of modern Unix is that there are simply too many programs that do random chunks of text processing for anyone except a specialist to remember or even know them all and know what they do. Thus it's quite possible that there any number of clever ways to do this with relatively standard and widely available GNU or other tools. I just don't know what they are off the top of my head and it is much faster to use tools that I know, even in brute force ways, than to go searching and searching and maybe not find anything.

So here is how I did this, on the fly, using tools that I'm already familiar with (which primarily means awk). Let's assume the file is pkglist:

sort -b -k2 pkglist | sed 's/: */:/' |
  awk -F: '$2 == last {sum = sum " " $1}
           $2 != last && last {printf "%s\n\t%s\n", last, sum}
           $2 != last {sum = $1; last = $2}
           END {printf "%s\n\t%s\n", last, sum}'

(The actual version I used put all of this on one line, because a nice clean multiline thing isn't the kind of thing you do on the fly; it's what you do when you're cleaning it up to write about.)

The 'sort -b' bit is due to a GNU sort gotcha.

(Yes, I really write this sort of complex thing on the fly.)


Comments on this page:

From 141.84.9.5 at 2013-02-23 09:45:49:
awk -F': *' '{o[$2]=o[$2] " " $1}END{for(k in o)print k,"\n",o[k]}' <pkglist
By cks at 2013-02-23 23:06:20:

That's a better awk version and nicely shows that I don't use awk enough to be really good with it; I would have had to look up how to use awk arrays and it just wasn't worth it for this. That use of -F is a good trick, too (and I'd never noticed that the field separator can be a regular expression).

From 141.84.9.5 at 2013-02-24 13:07:38:

-F taking a regexp is one of the few unique awk features, even Perl doesn't have it. (I also often use -F'[,;|]' or similar.)

From 87.79.78.105 at 2013-02-25 11:58:58:

Actually Perl does accept a pattern as the argument to its -F switch. The catch is that you cannot include literal spaces in patterns (because of an interaction with machinery in perl that tries to respect the shebang line of a script no matter how it was invoked – which forces perl to split on whitespace outside a few exceptional switches like -e). So the AWK equivalent in Perl is this:

   perl -lnaF':\x{20}*' -e '$o{$F[1]}.=" $F[0]";END{print"$a\n$b"while($a, $b)=each%o}' < pkglist

However, this is one of those tasks needing just simple enough data structures that Perl’s much greater manipulexity is a disadvantage; AWK wins.

Aristotle Pagaltzis

By John Wiersba at 2016-04-06 11:05:14:

I think 141.84.9.5 was thinking of awk's record separator RS (perl's -0 or $/) which can be a pattern in awk (not in compatibility mode) but not in perl.

Written on 22 February 2013.
« Go: using type assertions to safely reach through interface types
What limits the number of concurrent connections to a server »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Feb 22 15:40:09 2013
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.