Some uses for Prometheus's resets() function

April 4, 2021

One of the functions available in Prometheus's PromQL query language is resets(), which is described this way in the documentation:

For each input time series, resets(v range-vector) returns the number of counter resets within the provided time range as an instant vector. Any decrease in the value between two consecutive samples is interpreted as a counter reset.

resets should only be used with counters.

Up until recently I've ignored resets() because knowing when a counter had reset didn't seem particularly useful to me. This changed due to roidelapluie's comment about it on this entry (I'll get to it), which caused me to start thinking about resets() in general. But before I get to its possible uses, there's an important qualification on the documentation.

If a time series disappears for a while then reappears, the last value from before the disappearance is consecutive with the first value after it reappears. All that resets() cares about is the stream of values when a time series exists, so if the post-appearance value is lower the the pre-disappearance value, this is counted as a reset. Much like the changes() function, the code that evaluates this is completely blind to there even being periods of time where the time series isn't there. By extension, if a counter time series disappears for a while but comes back with the same value or higher, this won't be considered a reset.

(This makes reasonable sense if you think about scrape failures. You don't really want a scrape failure or a series of them to make Prometheus declare that counters have reset.)

As pointed out by roidelapluie, the first thing you can do with resets() is apply it to a continuous metric that's either 0 or 1, such as a success metric from Blackbox, in order to count how many times the time series has started to fail over a time interval (or more generally has gone to 0). Resets() is blind to what type the metric is, so when probe_success goes from 1 to 0 it will happily consider this a counter reset and count it for you.

(This won't work so well if the counter can take on additional non-zero values, because then every time the value goes down resets() will think it reset, even if it didn't go all the way down to 0. There are workarounds for this.)

Another thing you can do is apply resets() to non-continuous metrics that drop (or reset) when something interesting happens. For example, suppose you have a metric that is how many packets a VPN user's session has transmitted or received (with the time series for a user being absent if they have no sessions). You can be pretty certain that when a session is shut down and a new one started up, the new session will have a lower packet count than the old one. Given this, you can use resets() to count more or less how many times the user shut down their old VPN connection and made a new one over a time range, even if there was some time between the old session being shut down and the new one starting.

(As mentioned in my entry on changes(), it's probably better if you have a metric for the start time of a user's VPN session. But you may not always have that data, especially in a Prometheus metric.)

Applying resets() to a continuous metric that counts up until something happens will obviously tell you how many times that thing has happened over your time range. However, most counter metrics are associated with more directly usable indicators of things like host reboots or process restarts, and most gauge metrics are too likely to go down on their own instead of resetting.

The final use for resets() I can think of is telling you how many time a gauge metric goes down (as always, over some time range). This can be combined with changes() to let you determine how many times a gauge metric has gone up over a time range:

changes( some_metric[1h] ) - resets (some_metric[1h] )

I don't know if Prometheus will optimize this to only load the time series points for some_metric[1h] into memory once, or if it will do two loads (one for changes(), one for resets()). This might be an especially relevant thing to consider if you're using a subquery instead of a simple metric lookup.

Sidebar: resets() isn't less efficient if there are lots of resets

As far as I can tell from the code, resets() is just as efficient on metrics that have lots of resets as it is on metrics with only a few of them. So is changes(). In both cases, the code does a simple scan over all of the points in a time series and counts up the number of times it sees the relevant thing happening. The relevant Go code for resets() is small enough to just put here:

resets := 0
prev := samples.Points[0].V
for _, sample := range samples.Points[1:] {
   current := sample.V
   if current < prev {
     resets++
   }
   prev = current
}

The changes() code is the same thing with the condition changed slightly (including to account for that NaN's don't compare equal to each other). You can find all of the code in promql/functions.go.

Written on 04 April 2021.
« You need a version of Go with module support (ideally good support)
A stable Unix updating its version of Go isn't straightforward »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Apr 4 23:25:03 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.