2022-10-21
The Prometheus timestamp()
function can be used on expressions, sort of
Suppose, not entirely hypothetically, that you have a Prometheus metric that has been slowly drifting upward for some time. You would like to find out the last time that it has been below some value. In an ideal world this would be a simple Prometheus operation, because the raw data is there in the TSDB. In this actual world, Prometheus makes it hard to work with when metrics actually happened. However, you can do it in a relatively straightforward way because of something I had forgotten until I started to write this entry.
That is, the timestamp()
function doesn't have to be used on metrics alone, or more exactly
on an instant vector selector.
You can also use it on some expressions, such as 'your_metric <
22.0
'. This works because timestamp()
takes an instant
vector as its input, and Prometheus expressions are filters. When you write 'your_metric <
22.0
', you start with the instant vector of 'your_metric
' and
then filter it down to the smaller instant vector where the value
is less than 22.0.
Unfortunately, this use of timestamp()
doesn't actually do what you
would really like. When Prometheus processes an instant vector in
this way, the time associated with each entry in the instant vector
is reset to the nominal moment of the query. For instance, the query
'timestamp(node_load1)
' will give you an assortment of slightly
different timestamps, but if you do 'timestamp(node_load1 > 0)
',
you get the same timestamp for everything. If you need the very precise
timestamp, you're out of luck; you need to get the raw time series
data. If you're making a broad ranging query
(using a subquery) and you're happy to have a somewhat imprecise answer,
this is okay.
To actually use timestamp()
to determine the most recent time
that 'your_metric
' was below some value, you have to use a
subquery:
max_over_time ( ( timestamp( your_metric < 22.0 ) )[90d:1m] )
Because of the effect of resetting the 'time' of an instant vector
expression, this will get you the time with a roughly one minute
precision (and also the time will be aligned on minute boundaries). If you want the earliest time
instead of the most recent time, you can use min_over_time
instead.
(Obviously I already knew most of this when I wrote that earlier entry but then I forgot it again in the past almost two years. Having written it down as a full entry, hopefully it'll stick in my mind this time around.)
Sidebar: The other (harder) way to do this
Before I found my entry on it being hard to work with when metric
points happened to use a link to it in
this entry and re-discovered that I could use timestamp()
for this,
the approach I was using was a much more brute force one:
max_over_time( ( (your_metric < 22.0) * 0 + time() )[90d:1m] )
This filters 'your_metric
', multiplies it by zero to eliminate
its value, and then adds the time that the step of the subquery is
being evaluated at (to zero, leaving us with the time). This will
give you the same results, just in a different way.
My view is that you don't want to use ' < bool' here because it
doesn't filter. Not filtering means Prometheus has to deal with
more data, and it also complicates a version that uses min_over_time
instead.