== Prometheus subqueries pick time points in a surprising way Up until today, I would have confidently told you that I understood how [[Prometheus subqueries https://prometheus.io/blog/2019/01/28/subquery-support/]] picked the time points that they evaluated your expression at; it was the obvious combination of a [[range vector https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors]] with a [[query step PrometheusQuerySteps]]. Given a subquery range such as '_[5h:1m]_' and assuming an instant query evaluated at 'now', Prometheus would first go back exactly five hours in seconds, as it would for a range vector of '_[5h]_', and then step forward from that starting time ever one minute (in seconds), as it would in a Prometheus graphing query (what the HTTP API calls a [[range query https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries]]). In fact [[I did assert this https://old.reddit.com/r/PrometheusMonitoring/comments/b1za2c/question_about_subquery_alignment_and_counter/]], and then Brian Brazil corrected me. Subqueries do not work this simply and straightforwardly, and how they actually work may have implications for how you want to use them. It turns out that ~~subqueries use times that are evenly divided by the step interval~~, which the [[the blog post https://prometheus.io/blog/2019/01/28/subquery-support/]] and other people describe as 'being aligned with'. As part of this ~~subqueries will start (and finish) earlier than specified~~ in order to generate as many metric points as they intuitively should. This even division is in Unix time, which is in UTC, not in your local timezone. This is all very abstract, so let's use the example of a subquery range of '_[5d:1d]_'. This should intuitively yield five metric points as the result, and since the step is one day, the *aligned* times Prometheus picks will be at midnight UTC (ie, when ' % 1d' is 0). As I write this, the time is March 19th 01:40 UTC or so (and 9:40 pm March 18th in local time), and if I execute this query now I will get the following Unix timestamps of metric points, shown here with their translation to UTC time: .pn prewrap on > @1552608000 March 15 00:00 UTC > @1552694400 March 16 00:00 UTC > @1552780800 March 17 00:00 UTC > @1552867200 March 18 00:00 UTC > @1552953600 March 19 00:00 UTC Notice that the oldest timestamp is earlier than now minus exactly five days of seconds, which would be March 14th at 01:40 UTC, and the most recent timestamp is not 'now' but back at midnight UTC (which was at 8pm local time). (Note that this is not when the metric points themselves come from; it is when the subquery expression is evaluated. If I use, say, '((timestamp(node_load1)[5d:1d]))' to extract both the metric point timestamp and the evaluation timestamp, I get results that differ, as you'd expect. You can see all of this in the Prometheus web interface by making 'console' queries for subquery range expressions; the web interface will show you all of the returned metric points and their timestamps.) At small subquery steps, like :1m or :10m or even perhaps :1h, this alignment probably doesn't matter a lot. At large time steps this may well be important to you, because Prometheus always aligns your subqueries to UTC, not to local time. There is no way to make a step of :6h or :12h or :1d align to local midnight, local midnight and local noon, and so on, or to be relative to 'now' instead of being aligned with absolute time. (Apparently the Prometheus people have a reason for doing it this way; I believe it boils down to helping cache things during query evaluation.) === Sidebar: The exact Prometheus code involved The code involved here is found in [[prometheus/promql/engine.go https://github.com/prometheus/prometheus/blob/master/promql/engine.go]], in the _eval()_ method, and reads: // Start with the first timestamp after (ev.startTimestamp - offset - range) // that is aligned with the step (multiple of 'newEv.interval'). newEv.startTimestamp = newEv.interval * ((ev.startTimestamp - offsetMillis - rangeMillis) / newEv.interval) if newEv.startTimestamp < (ev.startTimestamp - offsetMillis - rangeMillis) { newEv.startTimestamp += newEv.interval } I believe that _ev.startTimestamp_ is usually the same as the ending timestamp, and if it's not I don't understand what this code is going to wind up doing. The division here is integer division.