== Some implications of using _offset_ instead of _delta()_ in Prometheus I previously wrote about [[how _delta()_ can be inferior to subtraction with _offset_ PrometheusDeltaVsOffset]], because _delta()_ has to load the entire range of metric points and _offset_ doesn't. In light of the issue I ran into recently with [[stale metrics and range queries PrometheusStaleMetricsOverTime]], there turn out to be some implications and complexities of using _offset_ in place of _delta()_, even if it lets you make queries that you couldn't otherwise do. Let's start with the basics, which is that '_delta(mymetric[30d])_' can theoretically be replaced with '_mymetric - mymetric offset 30d_' to get the same result with far fewer metric points having to be loaded by Prometheus. This is an important issue for us, because we have some high-cardinality metrics that it turns out we want to query over long time scales like 30 or 90 days. The first issue with the _offset_ replacement is what happens when a particular set of labels for the metric didn't exist 30 days ago. Just like PromQL boolean operators ([[cf PrometheusExpressionsFilter]]), PromQL math operators on vectors are filters, so you'll ignore all current metric points for _mymetric_ that didn't exist 30 days ago. The fix for this is the inverse of [[ignoring stale metrics PrometheusStaleMetricsOverTime]]: .pn prewrap on > (mymetric - mymetric offset 30d) or mymetric Here, if _mymetric_ didn't exist 30 days ago we implicitly take its starting value as 0 and just consider the delta to be the current value of _mymetric_. Under some circumstances you may want a different delta value for 'new' metrics, which will require a different computation. The inverse of the situation is metric labels that existed 30 days ago but don't exist now. As we saw in [[an earlier entry PrometheusStaleMetricsOverTime]], the range query in the _delta()_ version will include those metrics, so they will flow through to the _delta()_ calculation and be included in your final result set. Although [[the _delta()_ documentation https://prometheus.io/docs/prometheus/latest/querying/functions/#delta]] sort of claims otherwise, [[the actual code implementing _delta()_ https://github.com/prometheus/prometheus/blob/master/promql/functions.go#L61]] reasonably doesn't currently extrapolate samples that start and end significantly far away from the full time range, so the _delta()_ result will probably be just the change over the time series points available. In some cases this will go to zero, but in others it will be uninteresting and you would rather pretend that the time series is now 0. Unfortunately, as far as I know there's no good way to do that. If you only care about time series (ie label sets) that existed at the start of the time period, I think you can extend the previous case to: > ((mymetric - mymetric offset 30d) or mymetric) > or -(mymetric offset 30d) (As before, this assumes that a time series that disappears is implicitly going to zero.) If you care about time series that existed in the middle of the time range but not at either the beginning or the end, I think you're out of luck. The only way to sweep those up is a range query and using _delta()_, which runs the risk of a 'too many metric points loaded' error. Unfortunately all of this is increasingly verbose, especially if you're using label matches restricting _mymetric_ to only some values (because then you need to propagate these label restrictions into at least the _or_ clauses). It's a pity that PromQL doesn't have any function to do this for us. I also have to modify something I said in [[my first entry on _offset_ and _delta()_ PrometheusDeltaVsOffset]]. Given all of these issues with appearing and disappearing time series, it's clear that optimizing _delta()_ to not require the entire range is not as simple as it looks. It would probably require some deep hooks into the storage engine to say 'we don't need all the points, just the start and the end points and their timestamps', and that stuff would only be useful for gauges (since counters already have to load the entire range set and sweep over it looking for counter resets). In our current usage we care more about how the current metrics got there than what the situation was in the past; we are essentially looking backward to ask what disk space usage grew or shrank. If some past usage went to zero and disappeared, it's okay to exclude it entirely. There are some potentially tricky cases that might cause me to rethink that someday, but for now I'm going to use the shorter version that only has one _or_, partly because Grafana makes it a relatively large pain to write complicated PromQL queries.