2024-08-20
Some brief notes on 'numfmt
' from GNU Coreutils
Many years ago I learned about numfmt
(also)
from GNU Coreutils (see the comments on this entry and then this entry). An additional source of information
is Pádraig Brady's numfmt - A number reformatting utility. Today I was faced
with a situation where I wanted to compute and print multi-day,
cumulative Amanda dump total sizes for filesystems in a readable
way, and the range went from under a GByte to several TBytes, so I
didn't want to just convert everything to TBytes (or GBytes) and
be done with it. I was doing the summing up in awk and briefly
considered doing this 'humanization' in awk (again, I've done it
before) before I remembered numfmt
and decided to give it a try.
The basic pattern for using numfmt here was:
cat <amanda logs> | awk '...' | sort -nr | numfmt --to iec
This printed out '<size> <what ...>', and then numfmt turned the first field into humanized IEC values. As I did here, it's better to sort before numfmt, using the full precision raw number, rather than after numfmt (with 'sort -h'), with its rounded (printed) values.
Although Amanda records dump sizes in KBytes, I had my awk print
them out in bytes. It turns out that I could have kept them in
KBytes and had numfmt do the conversion, with 'numfmt --from-unit
1024 --to iec
'.
(As far as I can tell, the difference between --from-unit and --to-unit is that the former multiplies the number and the latter divides it, which is probably not going to be useful with IEC units. However, I can see it being useful if you wanted to mass-convert times in sub-second units to seconds, or convert seconds to a larger unit such as hours. Unfortunately numfmt currently has no unit options for time, so you can only do pure numeric shifts.)
If left to do its own formatting, numfmt has two issues (at least when doing conversions to IEC units). First, it will print some values with one decimal place and others with no decimal place. This will generally give you a result that can be hard to skim because not everything lines up, like this:
3.3T [...] 581G [...] 532G [...] [...] 11G [...] 9.8G [...] [...] 1.1G [...] 540M [...]
I prefer all of the numbers to line up, which means explicitly specifying the number of decimal places that everything gets. I tend to use one decimal place for everything, but none ('.0') is a perfectly okay choice. This is done with the --format argument:
... | numfmt --format '%.1f' --to iec
The second issue is that in the process of reformatting your numbers, numfmt will by and large remove any nice initial formatting you may have tried to do in your awk. Depending on how much (re)formatting you want to do, you may want another 'awk' step after the numfmt to pretty-print everything, or you can perhaps get away with --format:
... | numfmt --format '%10.1f ' --to iec
Here I'm specifying a field width for enough white space and also putting some spaces after the number.
Even with the need to fiddle around with formatting afterward, using numfmt was very much the easiest and fastest way to humanize numbers in this script. Now that I've gone through this initial experience with numfmt, I'll probably use it more in the future.