2019-03-18
Prometheus subqueries pick time points in a surprising way
Up until today, I would have confidently told you that I understood
how Prometheus subqueries picked
the time points that they evaluated your expression at; it was the
obvious combination of a range vector
with a query step. Given a subquery range
such as '[5h:1m]
' and assuming an instant query evaluated at
'now', Prometheus would first go back exactly five hours in seconds,
as it would for a range vector of '[5h]
', and then step forward
from that starting time ever one minute (in seconds), as it would
in a Prometheus graphing query (what the HTTP API calls a range
query).
In fact I did assert this,
and then Brian Brazil corrected me. Subqueries do not work this simply
and straightforwardly, and how they actually work may have implications
for how you want to use them.
It turns out that subqueries use times that are evenly divided by the step interval, which the the blog post and other people describe as 'being aligned with'. As part of this subqueries will start (and finish) earlier than specified in order to generate as many metric points as they intuitively should. This even division is in Unix time, which is in UTC, not in your local timezone.
This is all very abstract, so let's use the example of a subquery
range of '[5d:1d]
'. This should intuitively yield five metric
points as the result, and since the step is one day, the aligned
times Prometheus picks will be at midnight UTC (ie, when '<timestamp>
% 1d' is 0). As I write this, the time is March 19th 01:40 UTC or
so (and 9:40 pm March 18th in local time), and if I execute this
query now I will get the following Unix timestamps of metric points,
shown here with their translation to UTC time:
@1552608000 March 15 00:00 UTC @1552694400 March 16 00:00 UTC @1552780800 March 17 00:00 UTC @1552867200 March 18 00:00 UTC @1552953600 March 19 00:00 UTC
Notice that the oldest timestamp is earlier than now minus exactly five days of seconds, which would be March 14th at 01:40 UTC, and the most recent timestamp is not 'now' but back at midnight UTC (which was at 8pm local time).
(Note that this is not when the metric points themselves come from;
it is when the subquery expression is evaluated. If I use, say,
'timestamp(node_load1)[5d:1d]
' to extract both the metric point
timestamp and the evaluation timestamp, I get results that differ,
as you'd expect. You can see all of this in the Prometheus web
interface by making 'console' queries for subquery range expressions;
the web interface will show you all of the returned metric points
and their timestamps.)
At small subquery steps, like :1m or :10m or even perhaps :1h, this alignment probably doesn't matter a lot. At large time steps this may well be important to you, because Prometheus always aligns your subqueries to UTC, not to local time. There is no way to make a step of :6h or :12h or :1d align to local midnight, local midnight and local noon, and so on, or to be relative to 'now' instead of being aligned with absolute time.
(Apparently the Prometheus people have a reason for doing it this way; I believe it boils down to helping cache things during query evaluation.)
Sidebar: The exact Prometheus code involved
The code involved here is found in prometheus/promql/engine.go,
in the eval()
method, and reads:
// Start with the first timestamp after (ev.startTimestamp - offset - range) // that is aligned with the step (multiple of 'newEv.interval'). newEv.startTimestamp = newEv.interval * ((ev.startTimestamp - offsetMillis - rangeMillis) / newEv.interval) if newEv.startTimestamp < (ev.startTimestamp - offsetMillis - rangeMillis) { newEv.startTimestamp += newEv.interval }
I believe that ev.startTimestamp
is usually the same as the ending
timestamp, and if it's not I don't understand what this code is going
to wind up doing. The division here is integer division.