2023-05-15
The time our Linux systems spend on integer to text and back conversions
Over on the Fediverse, I said something recently:
I sometimes think about all the CPU cycles that are used on Linux machines to have the kernel convert integers to text for /proc and /sys files and then your metrics system convert the text back to integers. (And then sometimes convert the integers back to text when it sends them to the metrics server, which is at least a different machine using CPU cycles to turn text back into integers (or floats).)
It's accidents of history all the way down.
We run the Prometheus host agent on all of our Linux machines. Every fifteen seconds our Prometheus server pulls metrics from all the host agents, which causes the host agent to read a bunch of /proc files (for things like memory and CPU state information) and /sys files (for things like hwmon information). These status files are text, but they contain a lot of numbers, which means that the kernel converted those integers into text for us. The host agent then converts that text back into numbers internally (I believe a mixture of 64-bit integers and 64-bit floats), only to turn around and send them to the Prometheus server as text again (see Exposition Formats, also). On the Prometheus server these text numbers will be turned back into floats. All of this takes CPU cycles, although perhaps not many CPU cycles on modern machines.
(The host agent gets some information from the Linux kernel through methods like netlink, which I believe transfers numbers in non-text form.)
All of the steps of this dance are rational ones. Things in /proc and /sys use text instead of some binary encoding because text is a universal solvent on Unix systems, and that way no one had to define a binary file format (or worse, try to get agreement on a general binary system stats kernel to userspace API). Text formats are usually easily augmented, upgraded, inspected, and so on, and they are easy to provide (the kernel actually has a lot of infrastructure for easily providing text in /proc files; we saw some of it in action recently).
(These factors are especially visible in the case of some of the statistics that OpenZFS on Linux exposes. ZFS comes from Solaris, which has a native binary 'kstat' system. ZoL exposes all of these kstats in /proc/spl/kstat/zfs as text, rather than try to get Linux people to somehow get them as binary kstats. Other ZFS IO statistics are exposed in an entirely different and more binary form.)
Changing the situation would require a lot of work by a lot of people spread across a lot of projects, so it's unlikely to be done. If it is ever done, it will probably be done piecemeal, maybe through more and more kernel subsystems exposing information through netlink as well as /proc (perhaps exposing new metrics only through netlink, with their /proc information frozen). But even netlink is probably more work for kernel developers than putting things in /proc, so I suspect that a lot of things will keep being in /proc.
(In addition, lots of things in /proc aren't just pairs of names and numbers, although that's the common case. Consider /proc/locks.)