== Be cautious with numbers in _awk_ I like _awk_, and often use it for quick little log aggregation things (often on the command line, if what I am interested in is a one-off). But awk has a small problem: it likes printing large numbers in exponential notation. The minor problem with this is that I find exponential notation for numbers harder to read than straight decimal notation. '3.18254e+10' is just harder to understand casually than 31825440599. The major problem with this is that when I do log aggregation, I often feed the result to '_sort -nr_' or the like, so I can see the result in a clearly sorted order (and perhaps pick out the top N). Numbers in exponential notation are *not* sorted 'correctly' by _sort_, as _sort_ requires things to be in decimal notation. Worse, when you are looking for the top N of something this issue can the precise entries you're most interested in to drop out. The highest entries are the ones most likely to have numbers large enough that awk starts putting them in exponential notation, which will make them sort very low indeed. This isn't just a theoretical concern. When writing [[yesterday's entry|../web/AccurateContentTypeImportance]], this exact issue almost caused me to miss four of the actual top six URLs by data transfered. (Fortunately I wound up noticing the missing entries when I was looking at detailed log output, and then worked out why it was happening.) The workaround is relatively simple: awk's '_%d_' printf format will print even large numbers in decimal notation. So instead of '_END {print sum}_' or the like, use '_END {printf "%d\n", sum}_'. (Unfortunately I find awk's _printf_ annoying for some reason, so I don't normally use it unless I have to. I guess I have to a lot more often now.) This isn't the end of the story, because this points to another caution for dealing with numbers in awk, namely: *awk uses floating point math, not integer math*, even for numbers that are entirely decimal. This is most likely to bite you if you are subtracting large numbers from each other; for example, computing differences between Unix timestamps. (This actually bit me once, in an assignment, and I wound up being sufficiently annoyed to use a baroque workaround involving breaking out of awk to get _bc_ to do that particular subtraction just so I could submit something that had the numbers absolutely correct.)