Wandering Thoughts archives

2021-04-04

Some uses for Prometheus's resets() function

One of the functions available in Prometheus's PromQL query language is resets(), which is described this way in the documentation:

For each input time series, resets(v range-vector) returns the number of counter resets within the provided time range as an instant vector. Any decrease in the value between two consecutive samples is interpreted as a counter reset.

resets should only be used with counters.

Up until recently I've ignored resets() because knowing when a counter had reset didn't seem particularly useful to me. This changed due to roidelapluie's comment about it on this entry (I'll get to it), which caused me to start thinking about resets() in general. But before I get to its possible uses, there's an important qualification on the documentation.

If a time series disappears for a while then reappears, the last value from before the disappearance is consecutive with the first value after it reappears. All that resets() cares about is the stream of values when a time series exists, so if the post-appearance value is lower the the pre-disappearance value, this is counted as a reset. Much like the changes() function, the code that evaluates this is completely blind to there even being periods of time where the time series isn't there. By extension, if a counter time series disappears for a while but comes back with the same value or higher, this won't be considered a reset.

(This makes reasonable sense if you think about scrape failures. You don't really want a scrape failure or a series of them to make Prometheus declare that counters have reset.)

As pointed out by roidelapluie, the first thing you can do with resets() is apply it to a continuous metric that's either 0 or 1, such as a success metric from Blackbox, in order to count how many times the time series has started to fail over a time interval (or more generally has gone to 0). Resets() is blind to what type the metric is, so when probe_success goes from 1 to 0 it will happily consider this a counter reset and count it for you.

(This won't work so well if the counter can take on additional non-zero values, because then every time the value goes down resets() will think it reset, even if it didn't go all the way down to 0. There are workarounds for this.)

Another thing you can do is apply resets() to non-continuous metrics that drop (or reset) when something interesting happens. For example, suppose you have a metric that is how many packets a VPN user's session has transmitted or received (with the time series for a user being absent if they have no sessions). You can be pretty certain that when a session is shut down and a new one started up, the new session will have a lower packet count than the old one. Given this, you can use resets() to count more or less how many times the user shut down their old VPN connection and made a new one over a time range, even if there was some time between the old session being shut down and the new one starting.

(As mentioned in my entry on changes(), it's probably better if you have a metric for the start time of a user's VPN session. But you may not always have that data, especially in a Prometheus metric.)

Applying resets() to a continuous metric that counts up until something happens will obviously tell you how many times that thing has happened over your time range. However, most counter metrics are associated with more directly usable indicators of things like host reboots or process restarts, and most gauge metrics are too likely to go down on their own instead of resetting.

The final use for resets() I can think of is telling you how many time a gauge metric goes down (as always, over some time range). This can be combined with changes() to let you determine how many times a gauge metric has gone up over a time range:

changes( some_metric[1h] ) - resets (some_metric[1h] )

I don't know if Prometheus will optimize this to only load the time series points for some_metric[1h] into memory once, or if it will do two loads (one for changes(), one for resets()). This might be an especially relevant thing to consider if you're using a subquery instead of a simple metric lookup.

Sidebar: resets() isn't less efficient if there are lots of resets

As far as I can tell from the code, resets() is just as efficient on metrics that have lots of resets as it is on metrics with only a few of them. So is changes(). In both cases, the code does a simple scan over all of the points in a time series and counts up the number of times it sees the relevant thing happening. The relevant Go code for resets() is small enough to just put here:

resets := 0
prev := samples.Points[0].V
for _, sample := range samples.Points[1:] {
   current := sample.V
   if current < prev {
     resets++
   }
   prev = current
}

The changes() code is the same thing with the condition changed slightly (including to account for that NaN's don't compare equal to each other). You can find all of the code in promql/functions.go.

sysadmin/PrometheusResetsFunction written at 23:25:03; Add Comment

You need a version of Go with module support (ideally good support)

Linux distributions and other Unixes often package versions of languages for you. This has good bits, such as easy installation and someone else worrying about the packaging, and bad bits, such as the versions potentially being out of date. For many languages (C being a great example), the exact version doesn't matter too much, and most any version installed by a Linux distribution will be fine for most people. Unfortunately Go's current strong transition to modules makes it not one of those languages; you now need a version of Go with (good) module support. Which leads to this tweet of mine:

It makes me a bit sad that Ubuntu 18.04's packaged version of Go is 1.10, which is one version too old to have any modules support. It's very Ubuntu but I still wish they'd updated since release.

On operating systems with versions of Go that are too old, such as Ubuntu 18.04, you should get or build your own local copy of a good version of Go, which probably should be the latest version. Fortunately this is an easy process; unfortunately it will complicate simply working with programs that written in Go and that you need to build yourself (possibly including locally written programs). Even if you can sort of work with the system version of Go when it lacks module support, I don't think you should try to do so; it's not going to be worth the extra pain and work.

(Related to this, you might want to start installing third party Go programs with 'go install ...@latest' or 'go get ...@latest', since both of these force modular mode if they work at all.)

There are at least three reasons to use a version of Go that supports Go modules and to use Go modules yourself. First, it's the way that the ecosystem is moving in general. I expect that an increasing number of development tools and general programs that work with Go code are going to require and assume modules, or at least work much better with modules. Like it or not, traditional $GOPATH Go development is going away as people drop code for it or the old code quietly stops working.

Second, Go modules are the mostly the better way to develop your own local programs. They complicate your life if you have your own local packages that you don't want to publish somewhere, but if you don't use local, un-published packages then they're a great way to get self-contained Go program source. And of course modules make it far less likely that random changes in outside packages that you use will break your programs.

Third, it's likely that a steadily increasing number of third party programs will stop building without modules, because they're set up to use older versions of third party packages (or older versions of their own packages). You could generally build them by hand with a carefully constructed $GOPATH environment, but using a version of Go with module support is much easier.

programming/GoModuleSupportNeed written at 01:08:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.