The Prometheus timestamp() function can be used on expressions, sort of

October 21, 2022

Suppose, not entirely hypothetically, that you have a Prometheus metric that has been slowly drifting upward for some time. You would like to find out the last time that it has been below some value. In an ideal world this would be a simple Prometheus operation, because the raw data is there in the TSDB. In this actual world, Prometheus makes it hard to work with when metrics actually happened. However, you can do it in a relatively straightforward way because of something I had forgotten until I started to write this entry.

That is, the timestamp() function doesn't have to be used on metrics alone, or more exactly on an instant vector selector. You can also use it on some expressions, such as 'your_metric < 22.0'. This works because timestamp() takes an instant vector as its input, and Prometheus expressions are filters. When you write 'your_metric < 22.0', you start with the instant vector of 'your_metric' and then filter it down to the smaller instant vector where the value is less than 22.0.

Unfortunately, this use of timestamp() doesn't actually do what you would really like. When Prometheus processes an instant vector in this way, the time associated with each entry in the instant vector is reset to the nominal moment of the query. For instance, the query 'timestamp(node_load1)' will give you an assortment of slightly different timestamps, but if you do 'timestamp(node_load1 > 0)', you get the same timestamp for everything. If you need the very precise timestamp, you're out of luck; you need to get the raw time series data. If you're making a broad ranging query (using a subquery) and you're happy to have a somewhat imprecise answer, this is okay.

To actually use timestamp() to determine the most recent time that 'your_metric' was below some value, you have to use a subquery:

max_over_time ( ( timestamp( your_metric < 22.0 ) )[90d:1m] )

Because of the effect of resetting the 'time' of an instant vector expression, this will get you the time with a roughly one minute precision (and also the time will be aligned on minute boundaries). If you want the earliest time instead of the most recent time, you can use min_over_time instead.

(Obviously I already knew most of this when I wrote that earlier entry but then I forgot it again in the past almost two years. Having written it down as a full entry, hopefully it'll stick in my mind this time around.)

Sidebar: The other (harder) way to do this

Before I found my entry on it being hard to work with when metric points happened to use a link to it in this entry and re-discovered that I could use timestamp() for this, the approach I was using was a much more brute force one:

max_over_time( ( (your_metric < 22.0) * 0 + time() )[90d:1m] )

This filters 'your_metric', multiplies it by zero to eliminate its value, and then adds the time that the step of the subquery is being evaluated at (to zero, leaving us with the time). This will give you the same results, just in a different way.

My view is that you don't want to use ' < bool' here because it doesn't filter. Not filtering means Prometheus has to deal with more data, and it also complicates a version that uses min_over_time instead.

Written on 21 October 2022.
« The programming challenge that is a modern browser
TLS Certificate Transparency is about improving the (web) TLS ecology »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 21 22:48:12 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.