A trick for dealing with irregular multi-word lines in shell scripts
Suppose that you have a bunch of lines in what I've sort of described as a 'key=value' format, that look like this:
<timestamp> key1=value1 key2=value2 key3=value3 ...
Also, let's suppose that the fields and their ordering isn't constant, for example some lines omit key2 and its value. If it wasn't for this inconsistency, there's lots of Unix tools that you could use; with this inconsistency, I can't think of a Unix program that naturally deals with this format (one where you can say 'give me key1 and key7' in the same easy way you can get field 1 and field 7 in awk).
Fortunately, Unix gives us some brute force tricks.
Selecting lines based on field contents is pretty easy:
grep ' key1=[^ ]*example'
(The space before the key name may not be necessary depending on what key names your file uses.)
I don't have any clever tricks if you want to aggregate or otherwise process several fields, but if you just want to pull out and analyze one field there is a brute force trick that you can often use. Let me show you a full command example:
egrep ' p=(1|0.9)' | tr ' ' '\012' | grep '^f=' | sed 's/.*@//' | howmany | sed 20q
The important trick is the tr
combined with the grep
. The tr
breaks each log file line apart so that each 'key=value' pair is on its
own line (by turning the spaces that separate fields into newlines).
Once each key=value pair is on a separate line, we can select just the
field we want and process it. Meanwhile the initial egrep
is selecting
which whole lines we want to work on before the tr
slices everything
apart.
Of course, you don't necessarily need the lines to be in 'key=value' format. A variant of this 'split words into separate lines' trick can be done to any file format where you can somehow match the individual 'words' that you want to further process. And you don't have to split on spaces; any distinguishing character will do.
(If the field separator is several characters you can split things
with sed
. I used tr
here because it's simpler for single-character
splitting.)
I call this brute force because we're not doing anything particularly clever to extract just the words we care about from inside each line. Instead we're slicing up everything and then throwing most of the pieces away.
|
|