In Prometheus queries, on
and ignoring
don't drop labels from the result
Today I learned that one of the areas of PromQL, the query language for Prometheus that I'm a still a bit weak on is when labels will and won't get dropped from metrics as you manipulate them in a query. So I'll start with the story.
Today I wrote an alert rule to make sure that the network interfaces
on our servers hadn't unexpectedly dropped down to 100 Mbit/second
(instead of 1Gbit/s or for some servers 10Gbit/s). We have a couple
of interfaces on a couple of servers that legitimately are at 100M
(or as legitimately as a 100M connection can be in 2021), and I
needed to exclude them. The speed of network interfaces is reported
by node_exporter
in node_network_speed_bytes
, so I first wrote an expression
using unless
and all of the labels involved:
node_network_speed_bytes == 12500000 unless ( node_network_speed_bytes{host="host1",device="eno2",...} or node_network_speed_bytes{host="host2",device="eno1",...} )
However, most of the standard labels you get on metrics from the
host agent (such as job
, instance
, and so on) are irrelevant
and even potentially harmful to include (the full set of labels
might have to change someday). The labels I really care about are
the host and the device. So I rewrote this as:
node_network_speed_bytes == 12500000 unless on(host,device) [....]
When I wrote this expression I wasn't sure if it was going to drop
all other labels beside host
and device
from the filtered end
result of the PromQL expression. It
turns out that it didn't; the full set of labels for
node_network_speed_bytes
is passed through, even though we're
only matching on some of them in the unless
.
(The host and the device are all that I needed for the alert message so it wouldn't have been fatal if the other labels were dropped. But it's better to retain them just in case.)
Aggregation operators
discard labels unless you use without
or by
, as covered by their
documentation (although it's not phrased that way), since aggregating
over labels is their purpose. As I've found out, careless use of
aggregation operators can lose labels that are valuable for alerts (which may be what left me jumpy about
this case). Aggregation over time
keeps all labels, though, because it's aggregating over time instead
of over some or all labels. But as I was reminded today (since I'm
sure I've seen it before), vector matching
using on
and ignoring
don't drop labels, they merely restrict
what labels are used in the matching (and then it's up to you to
make sure you still have a one to one vector match or at least a
match that you expect; I've made mistakes there).
(You can also explicitly pull in additional labels from other metrics.)
There may be other cases in PromQL where labels are dropped, but if so I can't think of them right now. My overall moral is that I still need to test my assumptions and guesses in order to be sure about this stuff.
Sidebar: Why I used unless (... or ...)
in this query
In many cases, the obvious way to exclude some things from an alert rule expression is to use negative label matches. However, these can't match on the combination of several labels instead of the value of a single label. As far as I know, if you want to exclude only certain label combinations (here 'host1 and eno2' and 'host2 and eno1') where the individual label elements can occur separately (so host1 and host2 both have other network interfaces, and other hosts have eno1 and eno2 interfaces), you're stuck with more awkward construction I used. This construction is unfortunately somewhat brute force.
|
|