Part of good awk programming is getting the clause order right
Today I wanted to extract IP addresses from a recording file that had them in a particular format, in repeated stanzas that looked like:
<date> ---HTTP--- <IP> [...] ---SOMETHING--- [possibly other IPs]
I wanted all the HTTP IPs. I'm sure that somewhere there is a convenient multiline grep but since I didn't have one handy, I reached for awk. As it turns out, solving this problem concisely in awk makes a good example of a possibly underappreciated art in awk programming, namely choosing the order of your clauses.
Here's the awk program I came up with:
/^---/ { p = 0; } p { print $0; } /^---HTTP/ { p = 1; }
Because awk evaluates clauses in order, you can exploit the ordering of clauses as a deliberate part of program logic (and you can also blow your foot off with the wrong choice of order).
Printing of the current line is controlled by p
, a flag variable. If
p
is set, the line is printed; if p
is unset, we output nothing. We
set p
when we see the '---HTTP
' leadin, but since we don't want to
print the leadin itself (just the following IP addresses) we make this
the last clause. The IP addresses end with a '---SOMETHING---
' line
(but we don't bother to match that much), which causes us to turn off
p
; since we don't want that line printed, this clause is the first
one. Turning off p
in the first clause when the line is '---HTTP---
'
is harmless, because it will get turned on by the final clause. Since
it's harmless we don't need a longer match or more complex conditional
logic (and this means that we don't actually care which other thing
comes after the stanza we care about, so long as there is one).
This also shows the flaw of awk. This program is too clever by half, since it's just indirectly expressing the simple logic:
if (line is '---HTTP---') p = 1 elif (line starts with '---') p = 0 elif (p) print $0
In the name of golfing the awk program a bit I've embedded the elif
logic here in the ordering of my clauses, where it's much harder to see
than if it was written out simply and plainly.
(For even more fun, you can vary the order of the clauses to control whether the start or end markers will be printed. This would be a much more visible change in the clear version of the program.)
Comments on this page:
|
|