== Linux disk IO stats in Prometheus Suppose, not hypothetically, that you have a shiny new [[Prometheus https://prometheus.io/]] setup and you are running [[the Prometheus host agent https://github.com/prometheus/node_exporter]] on your Linux machines, some of which have disks whose IO statistics might actually matter (for example, [[we once had a Linux Amanda backup server with a very slow disk ../sysadmin/ReasoningBackwards]]). The Prometheus host agent provides a collection of disk IO stats, but it is not entirely clear where they come from and what they mean. The good news is that ~~the Prometheus host agent gives you the raw Linux kernel disk statistics~~ and they're essentially unaltered. You get statistics only for whole disks, not partitions, but the host agent includes stats for software RAID devices and other disk level things. I've written about what these stats cover in [[my entry on what stats you get DiskIOStats]] and also on [[what information you can calculate from them DiskIOStatsIII]], which includes an aside on disk stats for software RAID devices and LVM devices on modern Linux kernels. (The current version of the host agent makes two alterations to the stats; it converts the time based ones from milliseconds into seconds, and it converts the sector-based ones into bytes using the standard Linux kernel thing where one sector is 512 bytes. Both of these are much more convenient in a Prometheus environment.) The mapping between Linux kernel statistics and Prometheus metrics names is fortunately straightforward, and it is easy to follow because the host agent's help text for all of the stats is pretty much their description in the kernel's [[Documentation/iostats.txt https://www.kernel.org/doc/Documentation/iostats.txt]]. There are a few changes, but they are pretty obvious. For example, the kernel description of field 10 is '# of milliseconds spent doing I/Os'; the host agent's corresponding description of ``node_disk_io_time_seconds_total'' is 'Total seconds spent doing I/Os'. (In the current host agent the help text is somewhat inconsistent here; for instance, some of it talks about 'milliseconds'. This will probably be fixed in the future.) Since Prometheus exposes all of the Linux kernel disk stats, you can generate all of the derived stats that I discussed in [[my entry on this DiskIOStatsIII]]. Actually calculating them will involve a lot of use of [[_rate()_ or _irate()_ ../sysadmin/PrometheusRateVsIrate]]; for pretty much every stat, you'll have to start out calculations by taking the _rate()_ of it and then performing the relevant calculations from there. This is a bit annoying for several reasons, but Prometheus is Prometheus. There are two limitations of these stats. First, as always, they're averages with everything that that implies (see [[here ../tech/MisleadingAverages]] and [[here ../tech/MisleadingAveragesII]]). Second, they're going to be averages over appreciable periods of time. At the limit, you're unlikely to be pulling stats from the Prometheus host agent more than once every 10 or 15 seconds, and sometimes less frequently than that. Very short high activity bursts will thus get smeared out into lower averages over your 10 or 15 or 30 second sample resolution. To get a second by second view that captures very short events, you're going to need to sit there on the server with a tool like [[mxiostat MxiostatPointer]], or _iostat_ if you must. You can get around at least the issue of averages with something like [[the Cloudflare eBPF exporter https://github.com/cloudflare/ebpf_exporter]] (see also [[Cloudflare's blog post on it https://blog.cloudflare.com/introducing-ebpf_exporter/]]). If other burst events matter to you, you could probably build some infrastructure that would capture them in histograms in a similar way. (Histograms that capture down to single exceptional events are really the way to go if you care a lot about this, because even a second by second view is still an average over that second. However you're a lot more likely to see things in a second by second view than in a 15, 30, or 60 second one, assuming that you can spot the exceptions as they flow by.)