The time our Linux systems spend on integer to text and back conversions

May 15, 2023

Over on the Fediverse, I said something recently:

I sometimes think about all the CPU cycles that are used on Linux machines to have the kernel convert integers to text for /proc and /sys files and then your metrics system convert the text back to integers. (And then sometimes convert the integers back to text when it sends them to the metrics server, which is at least a different machine using CPU cycles to turn text back into integers (or floats).)

It's accidents of history all the way down.

We run the Prometheus host agent on all of our Linux machines. Every fifteen seconds our Prometheus server pulls metrics from all the host agents, which causes the host agent to read a bunch of /proc files (for things like memory and CPU state information) and /sys files (for things like hwmon information). These status files are text, but they contain a lot of numbers, which means that the kernel converted those integers into text for us. The host agent then converts that text back into numbers internally (I believe a mixture of 64-bit integers and 64-bit floats), only to turn around and send them to the Prometheus server as text again (see Exposition Formats, also). On the Prometheus server these text numbers will be turned back into floats. All of this takes CPU cycles, although perhaps not many CPU cycles on modern machines.

(The host agent gets some information from the Linux kernel through methods like netlink, which I believe transfers numbers in non-text form.)

All of the steps of this dance are rational ones. Things in /proc and /sys use text instead of some binary encoding because text is a universal solvent on Unix systems, and that way no one had to define a binary file format (or worse, try to get agreement on a general binary system stats kernel to userspace API). Text formats are usually easily augmented, upgraded, inspected, and so on, and they are easy to provide (the kernel actually has a lot of infrastructure for easily providing text in /proc files; we saw some of it in action recently).

(These factors are especially visible in the case of some of the statistics that OpenZFS on Linux exposes. ZFS comes from Solaris, which has a native binary 'kstat' system. ZoL exposes all of these kstats in /proc/spl/kstat/zfs as text, rather than try to get Linux people to somehow get them as binary kstats. Other ZFS IO statistics are exposed in an entirely different and more binary form.)

Changing the situation would require a lot of work by a lot of people spread across a lot of projects, so it's unlikely to be done. If it is ever done, it will probably be done piecemeal, maybe through more and more kernel subsystems exposing information through netlink as well as /proc (perhaps exposing new metrics only through netlink, with their /proc information frozen). But even netlink is probably more work for kernel developers than putting things in /proc, so I suspect that a lot of things will keep being in /proc.

(In addition, lots of things in /proc aren't just pairs of names and numbers, although that's the common case. Consider /proc/locks.)


Comments on this page:

It pains me to think that every "human-readable integer" requires a bunch of integer divisions (among the most expensive instructions) to print and then to parse. And all the microservies exchanging and logging yamls and jsons only make it worse.

I wonder how much more efficient it would be to printf("%x") rather than %d (let alone %f). The printf() path might already be very inefficient, but at least you shouldn't require any divisions (some bitshifts and compares to determine the number length might be enough).

This would be a bit silly, you may as well send the raw binary bits if the numbers are not human readable. Plaintext JSON, HTTP /sys/ and so on survive partly (mainly?) because of their ease.

By George at 2023-05-18 08:13:01:

Precisely because division instructions are expensive, division by a constant is invariably implemented using a multiply by an inverse. Essentially, n/d = n * ceil(232/d) >> 32, subject to a bunch of fiddly rounding conditions which have been analyzed.

If you look at the Linux kernel decimal conversion code, you'll see it's quite heavily optimized, starting with the fact that it actually converts to base-100, which is converted to ASCII digit pairs using a 200-byte lookup table.

That code is gorgeous. I don't understand most of it but I say the same when I visit a gallery.

I might consider stealing it for use on some micros, but I guess systems where I really need the extra perf over a generic sprintf() implementation are probably already so simple and small bit-depth that base 100 might hurt more than it helps. Benchmarking will tell me, if I ever find I need it.

I also think about those wasted cycles, and it's one reason the UNIX model is inferior to others. Systems that predated UNIX would've been aghast at such waste, and so had mechanisms in place to prevent it, such as by agreeing on more efficient representations, something UNIX has never been able to do. This accident of history occurred because UNIX was never intended to escape its laboratory.

It's clear to me that human-readable data formats are deeply inferior to more efficient numerical data representations, as it's easier to go from known-to-known than unknown-to-known, and the latter is necessary in all parsing. It's much easier to transform the numerical representation of some data into a human-readable representation than the reverse.

All of the steps of this dance are rational ones.

For some definition of rational, sure.

Things in /proc and /sys use text instead of some binary encoding because text is a universal solvent on Unix systems, and that way no one had to define a binary file format (or worse, try to get agreement on a general binary system stats kernel to userspace API).

Text isn't a universal representation, even under UNIX, because UNIX forces all programs to deal with the bit-level representation of that text, and this is one reason why UNIX has never been able to change that representation adequately. It was deemed easier to force UTF-8 on the world, so that this illusion could continue, than to accept that UNIX is a failure.

Text formats are usually easily augmented, upgraded, inspected, and so on, and they are easy to provide (the kernel actually has a lot of infrastructure for easily providing text in /proc files; we saw some of it in action recently).

This is true of any properly-designed format. None of the text formats in UNIX of which I'm aware are properly-designed.

Now, consider an alternative representation for basic numbers, such as BCD. There's still work involved, but this is a format which the machine can natively manipulate, and its bit-level representation means it can still easily be compared and lightly-used, with no conversion. It also predates ASCII. Still, it would be better to expose some fixed-length integer representation rather than a decimal ASCII representation.

Regarding what George wrote, this is an example of work by those whom I deem to be Intelligent Idiots and I'll explain what I mean by that. Now, it's undoubtedly wasteful to repeatedly convert between representations like this, and it takes an intelligent person to optimize the relevant code as has been done there, but it takes an idiot to not see the problem for what it is, and eliminate it entirely. An intelligent idiot can always make the code better like this, but can never remove it entirely. All of that work would be unnecessary if the system weren't so wasteful to start, but removing such systemic waste is beyond the ability of such an intelligent idiot.

Written on 15 May 2023.
« Why I use separate lexers in my recursive descent parsers
Having metrics for something attracts your attention to it »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Mon May 15 22:19:01 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.