The fun of awk

May 30, 2008

I like awk, I really do, but sometimes it really irritates me. Take, for example, this fun little awk program:

awk 'BEGIN {print "5" == "05"}' /dev/null

You might rationally expect this to print '1' (awk's boolean truth value). As I found out once, you would be sadly mistaken; this is false, presumably because awk winds up doing a string comparison instead of a numeric one. Too bad if you're reading one set of fields that are zero-padded and one set that aren't.

(The workaround is add 0 to the "05" to force the numeric interpretation; "5" == ("05"+0) comes out true.)

This shows two drawbacks of the sort of magical conversion between numbers and strings that awk does. First, this sort of stuff involves heuristics, and heuristics are inevitably wrong sooner or later. And second, if you do not have the fine details carefully memorized you can wind up surprised.

At the same time such magical conversions live on because they are oh so very handy when you are banging things in a hurry. Considering the sorts of things that awk was designed for, this is completely the right decision for it; having to write explicit Python-style conversions all the time would probably drive me up the wall, however much I like them in Python.


Comments on this page:

From 71.65.56.124 at 2008-05-31 11:01:34:

You can also take away the quotes if you know your input will be mathematical.

--Matt
http://standalone-sysadmin.blogspot.com

By cks at 2008-05-31 12:06:38:

In this case the real version was more like 'if ($1 == $3) ...'; the actual values I was comparing were input fields instead of one being a constant.

From 71.65.56.124 at 2008-06-01 16:50:31:

Interesting

At my shell, I get this:

Matt-Simmons-Computer:~ mattsimmons$ echo "1 2 3" | awk '{print ($1 == $3)}'
0
Matt-Simmons-Computer:~ mattsimmons$ echo "3 2 3" | awk '{print ($1 == $3)}'
1
Matt-Simmons-Computer:~ mattsimmons$ echo "3 2 03" | awk '{print ($1 == $3)}'
1
Matt-Simmons-Computer:~ mattsimmons$ echo "3 2 03" | awk '{print ($1 == $3)}'

Is this consistent with what you see on yours?

--Matt
http://standalone-sysadmin.blogspot.com

By cks at 2008-06-02 00:12:05:

I have managed to find the script with the specific issue. I was comparing an explicitly set awk variable with an input field, roughly:

awk 'BEGIN { day = "'$d'" }
     /^From / {if (day == $5) [...]

If the field value was 0-padded, this comparison is false. In retrospect, I could reasonably count on the 'day' value being a properly formed number and avoid making it into a string (if it's not a properly formed number, the rest of the awk will blow up anyways), which would avoid the whole issue.

By cks at 2008-06-02 00:13:50:

PS: I should mention that my versions of awk behaves consistently with Matt's for comparing input fields against each other.

Written on 30 May 2008.
« Users are rational
What contracts aren't »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri May 30 23:25:47 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.