== A trick for dealing with irregular multi-word lines in shell scripts Suppose that you have a bunch of lines in what [[I've sort of described ../spam/ForgedFromSelf-2012-02-26]] as a 'key=value' format, that look like this: > _ key1=value1 key2=value2 key3=value3 ..._ Also, let's suppose that the fields and their ordering isn't constant, for example some lines omit key2 and its value. If it wasn't for this inconsistency, there's lots of Unix tools that you could use; with this inconsistency, I can't think of a Unix program that naturally deals with this format (one where you can say 'give me key1 and key7' in the same easy way you can get field 1 and field 7 in awk). Fortunately, Unix gives us some brute force tricks. Selecting lines based on field contents is pretty easy: > _grep ' key1=[^ ]*example'_ (The space before the key name may not be necessary depending on what key names your file uses.) I don't have any clever tricks if you want to aggregate or otherwise process several fields, but if you just want to pull out and analyze one field there is a brute force trick that you can often use. Let me show you a full command example: > _egrep ' p=(1|0.9)' | tr ' ' '\012' | grep '^f=' | sed 's/.*@//' | > [[howmany ../sysadmin/LittleScriptsI]] | sed 20q_ The important trick is the _tr_ combined with the _grep_. The _tr_ breaks each log file line apart so that each 'key=value' pair is on its own line (by turning the spaces that separate fields into newlines). Once each key=value pair is on a separate line, we can select just the field we want and process it. Meanwhile the initial _egrep_ is selecting which whole lines we want to work on before the _tr_ slices everything apart. Of course, you don't necessarily need the lines to be in 'key=value' format. A variant of this 'split words into separate lines' trick can be done to any file format where you can somehow match the individual 'words' that you want to further process. And you don't have to split on spaces; any distinguishing character will do. (If the field separator is several characters you can split things with _sed_. I used _tr_ here because it's simpler for single-character splitting.) I call this brute force because we're not doing anything particularly clever to extract just the words we care about from inside each line. Instead we're slicing up everything and then throwing most of the pieces away.