Getting my unit size 'prefixes' (really suffixes) straight, sort of

November 27, 2022

As a system administrator and general computer person, I deal with at least three different sorts of sizes; bytes in powers of ten (beloved by disk makers and not me), bytes in powers of two (for RAM and many other things), and bits in powers of ten (used for Ethernet speeds). For various reasons I default to bytes in powers of two, and I've been inconsistent about the unit suffixes that I use for them when I write about things here on Wandering Thoughts. So in the spirit of things like getting my NVMe terminology straight, today I'm going to cover what I should use (with no guarantees that I actually will).

In theory, the official metric (power of ten) prefixes are written as 'T', 'G', 'M', and 'k'. This isn't in accordance with customary computer use, which upper-cases the 'k' to 'K'. According to Wikipedia, binary prefixes are written as 'Ti', 'Gi', 'Mi', and 'Ki', although Wikipedia also notes that there's plenty of usage (my phrasing) of plain 'T', 'G', and so on to mean the binary versions. However, both usage leave it ambiguous whether you're writing about bytes or bits.

As covered in Wikipedia's Megabyte page, in theory this is disambiguated with a trailing 'B' to mean bytes. Thus, 'TB' means decimal terabytes and 'TiB' means binary terabytes. Or you can just write out 'TBytes' and 'TiBytes'. Per Wikipedia's Bit rate and Data-rate units pages, units of bits (or bits per second) are written out in full, as 'Gbit/s' (decimal) or 'Gibit/s' (binary, should you find a use for binary bitrates).

Actual usage by real programs and people does not correspond to this nice picture. It's very common for programs and people, myself included, to use 'G' or 'GB' to mean a power of two gigabyte; for example, this is what several versions of Unix df will produce with 'df -h'. Modern computer users like things such as memory and disk usage in powers of two bytes because these things are normally allocated in sizes like 4 KiBytes (to write it out in full).

For completely correct usage I should use 'GiB', 'TiB', and 'MiB' when I mean power of two bytes instead of power of ten bytes (which is almost all of the time), and 'Gbits' when I mean power of ten bits (or bitrate). In practice, if I say '10G' or '1G' in the context of Ethernet, people are going to know what I mean (and they may not know what the data rate is exactly, just as I didn't until recently). Similarly, since almost no one uses power of ten sizes for things related to RAM and memory, '1 G' or '1 GB' is relatively unambiguous to people, even though it's technically incorrect.

In the end there's no good answer to this mismatch between official usage and customary usage (and expectations). The metric/SI focus on powers of ten is right for general usage, but in computing a lot of things are based around powers of two (once bytes became fixed at 8 bits), making a default to that very natural. We're probably never going to reconcile the two sides, especially in informal usage (which my writing generally is).

Written on 27 November 2022.
« Moving our /var/mail to be local on our IMAP server has gone very well
The annoying question of Intel CPU support for XMP RAM profiles »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sun Nov 27 22:54:36 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.