Wandering Thoughts archives

2024-08-20

Some brief notes on 'numfmt' from GNU Coreutils

Many years ago I learned about numfmt (also) from GNU Coreutils (see the comments on this entry and then this entry). An additional source of information is Pádraig Brady's numfmt - A number reformatting utility. Today I was faced with a situation where I wanted to compute and print multi-day, cumulative Amanda dump total sizes for filesystems in a readable way, and the range went from under a GByte to several TBytes, so I didn't want to just convert everything to TBytes (or GBytes) and be done with it. I was doing the summing up in awk and briefly considered doing this 'humanization' in awk (again, I've done it before) before I remembered numfmt and decided to give it a try.

The basic pattern for using numfmt here was:

cat <amanda logs> | awk '...' | sort -nr | numfmt --to iec

This printed out '<size> <what ...>', and then numfmt turned the first field into humanized IEC values. As I did here, it's better to sort before numfmt, using the full precision raw number, rather than after numfmt (with 'sort -h'), with its rounded (printed) values.

Although Amanda records dump sizes in KBytes, I had my awk print them out in bytes. It turns out that I could have kept them in KBytes and had numfmt do the conversion, with 'numfmt --from-unit 1024 --to iec'.

(As far as I can tell, the difference between --from-unit and --to-unit is that the former multiplies the number and the latter divides it, which is probably not going to be useful with IEC units. However, I can see it being useful if you wanted to mass-convert times in sub-second units to seconds, or convert seconds to a larger unit such as hours. Unfortunately numfmt currently has no unit options for time, so you can only do pure numeric shifts.)

If left to do its own formatting, numfmt has two issues (at least when doing conversions to IEC units). First, it will print some values with one decimal place and others with no decimal place. This will generally give you a result that can be hard to skim because not everything lines up, like this:

 3.3T [...]
 581G [...]
 532G [...]
 [...]
  11G [...]
 9.8G [...]
 [...]
 1.1G [...]
 540M [...]

I prefer all of the numbers to line up, which means explicitly specifying the number of decimal places that everything gets. I tend to use one decimal place for everything, but none ('.0') is a perfectly okay choice. This is done with the --format argument:

 ... | numfmt --format '%.1f' --to iec

The second issue is that in the process of reformatting your numbers, numfmt will by and large remove any nice initial formatting you may have tried to do in your awk. Depending on how much (re)formatting you want to do, you may want another 'awk' step after the numfmt to pretty-print everything, or you can perhaps get away with --format:

... | numfmt --format '%10.1f  ' --to iec

Here I'm specifying a field width for enough white space and also putting some spaces after the number.

Even with the need to fiddle around with formatting afterward, using numfmt was very much the easiest and fastest way to humanize numbers in this script. Now that I've gone through this initial experience with numfmt, I'll probably use it more in the future.

sysadmin/NumfmtBriefNotes written at 23:20:48; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.