To be fully useful, Prometheus histograms want their cumulative sums

July 26, 2022

True Prometheus histograms have a specific set of metrics and time series that they're made up of. As covered in the documentation, there are a bunch of '<basename>_bucket' time series, a '<basename>_count' time series, and a '<basename>_sum' time series that's the cumulative sum of all of the observations that are in the histogram. How all of this works is covered in, eg, How does a Prometheus Histogram work?.

However, not all external sources of histogram data provide a cumulative sum. For example, ZFS IO statistic histograms just give you histogram bucket counts. When generating Prometheus histogram metrics from such histogram sources, it seems common to generate a _sum metric (well, time series) that's just 0. This gives you something that will work in many situations, but after having wrestled with histograms built this way I've come to feel that you want to avoid it if possible. Prometheus histograms are more useful with their cumulative sums, and you can't rebuild an approximation of this information in PromQL as far as I know.

(I believe that both Grafana heatmaps of histograms and 'histogram_quantile()' will work on such sum-less histograms, though. You won't be able to get the average with '<thing>_sum / <thing>_count', but perhaps you can approximate that with histogram_quantile. On the other hand, the median is not the mean, and the difference may matter to you.)

If the underlying source of histogram data doesn't give you a cumulative sum, all you can do is make one up from the available histogram information. The ZFS iostats reporting code uses the midpoint of histogram buckets, and most likely you can't do better than that. Since you can readily compute this sort of cumulative sum in the code that converts from the native histogram format to Prometheus histogram metrics format, I think that you should. If you have the choice between two converters or exporters, one of which gives you a histogram sum and one of which gives you a 'sum' that's zero, I think you should take the one with the sum.

(And if you can reasonably modify a histogram exporter or converter to add a calculated sum, I think it's probably worth maintaining a custom version. Ideally you'll be able to get your change accepted upstream.)

PS: Perhaps there are clever PromQL ways to get around this lack of a <basename>_sum metric. There's a lot of tricks to PromQL and documentation on them is widely scattered and hard to find.

Sidebar: Conventional histogram buckets versus Prometheus ones

Histograms designed to be presented to people usually have independent bucket counts, where each bucket only counts the things that fall into its range. Prometheus histograms use cumulative bucket counts, where every bucket's count covers all things less than or equal to the top of its range (hence the 'le="..."' label). Converting independent bucket counts to cumulative bucket counts is straightforward, but someone has to remember to do it. Not doing this when you convert a non-Prometheus histogram to a Prometheus one can produce odd and unhelpful results when you try to process the Prometheus histogram, and is a mistake I'm pretty sure I've made in my early days of working with Prometheus.

Written on 26 July 2022.
« ZFS pool IO statistics (and vdev statistics) are based on physical disk IO
What ZFS 'individual' and 'aggregated' IO size statistics mean »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 26 22:56:03 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.