One situation where you absolutely can't use irate() in Prometheus

December 12, 2018

This is a story about me shooting myself in the foot repeatedly, until I finally noticed.

We have Linux NFS client machines, and we would like to know NFS client performance information about them so we can see things which filesystems they use the most. The Linux kernel provides copious per-mount information on NFS mounts and the Prometheus node exporter can turn all of it into metrics, but for us the raw node exporter stats are just too overwhelming in our environment; a typical machine generates over 68,000 different node_mountstats metrics values. So we wrote our own program to digest the raw mountstats metrics and do things like sum all of the per NFS operation information together for a given mount. We run the program from cron once a minute and publish its information through the node exporter's extremely handy textfile collector.

(The node exporter's current mountstats collector also has the same misunderstanding about the xprt: NFS RPC information that I did, and so reports it on a per-mount basis.)

For a while I've been trying to use the cumulative total RPC time with the total RPC count to generate an average time per RPC graph. Over and over I've plugged in some variation on 'cumulative seconds / cumulative operations' for various sources of both numbers, put an irate() around both, graphed this, and gotten either nothing at all or maybe a few points. Today I really dug into this and the penny finally dropped while I was brute-forcing things. The problem was that I was reflexively using irate() instead of rate().

The one situation where you absolutely can't use irate() is that you can't apply irate() to oversampled metrics, metrics that Prometheus is scraping more often than they're being generated. We're generating NFS metrics once a minute but scraping the node exporter every fifteen seconds, so three out of four of our recorded NFS metrics are the same. irate() very specifically uses the last two data points in your sample interval, and when you're oversampling, those two data points are very often going to be the exact same actual metric and so have no change.

In other words, a great deal of my graph was missing because it was all 'divide by zero' NaNs, since the irate() of the 'cumulative operations' count often came out as zero.

(Oversampled metric is my term. There may be an official term for 'scraped more often than it's generated' that I should be using instead.)

Looking back, I specifically mentioned this issue in my entry on rate() versus irate(), but apparently it didn't sink in thoroughly enough. Especially I didn't think through the consequences of dividing by the irate() of an oversampled metric, which is where things really go off the rails.

(If you merely directly graph the irate() of an oversampled metric, you just get a (very) spiky graph.)

Part of what happened is that when I was writing my exploratory PromQL I was thinking in terms of how often the metric was generated, not how often it was scraped, so I didn't even consciously consider that I had an oversampled metric. For example, I was careful to use 'irate(...[3m])', so I would be sure to have at least two valid metric points for a once a minute metric. This is probably a natural mindset to have, but that just means it's something I'm going to have to try to keep in mind for the future for node exporter textfile metrics and Pushgateway metrics.

Sidebar: Why I reflexively reach for irate()

Normally irate() versus rate() is a choice of convenience and what time range and query step you're working on. Since the default Prometheus graph is one hour, I'm almost always exploring with a small enough query step that irate() is the easier way to go. Even when I'm dealing with an oversampled metric, I can get sort of get away with this if I'm directly graphing the irate() result; it will just be (very) spiky, it won't be empty or clearly bogus.

Written on 12 December 2018.
« Why I'm usually unnerved when modern SSDs die on us
Some new-to-me features in POSIX (or Single Unix Specification) Bourne shells »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 12 01:53:32 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.