How (and where) Prometheus alerts get their labels
In Prometheus, you can and usually do
have alerting rules
that evaluate expressions to create alerts. These alerts are usually
passed to Alertmanager and they
are visible in Prometheus itself as a couple of metrics, ALERTS
and ALERTS_FOR_STATE
. These metrics can be used to do things
like find out the start time of alerts
or just display a count of currently active alerts on your dashboard. Alerts almost always have labels
(and values for those labels), which tend to be used in Alertmanager
templates to provide additional information along side annotations,
which are subtly but crucially different.
All of this is standard Prometheus knowledge and is well documented, but what doesn't seem to be well documented is where alert labels come from (or at least I couldn't find it said explicitly in any of the obvious spots in the documentation). Within Prometheus, the labels on an alert come from two places. First, you can explicitly add labels to the alert in the alert rule, which can be used for things like setting up testing alerts. Second, the basic labels for an alert are whatever labels come out of the alert expression. This can have some important consequences.
If your alert expression is a simple one that just involves basic
metric operations, for example 'node_load1 > 10.0
', then the basic
labels on the alert are the same labels that the metric itself has;
all of them will be passed through. However, if your alert expression
narrows down or throws away some labels, then those labels will be
missing from the end result. One of the ways to lose metrics in
alert expressions is to use 'by (...)
',
because this discards all labels other than the 'by (whatever)
' label
or labels. You can also deliberately pull in labels from additional
metrics, perhaps as a form of database
lookup (and then you can use these additional
labels in your Alertmanager setup).
Prometheus itself also adds an alertname
label, with the name of
the alert as its value. The ALERTS
metric in Prometheus also
has an alertstate
label, but this is not passed on to the version
of the alert that Alertmanager sees. Additionally, as part of
sending alerts to Alertmanager, Prometheus can relabel
alerts in general to do things like canonicalize some labels. This
can be done either for all Alertmanager destinations or only for a
particular one, if you have more than one of them set up. This
only affects alerts as seen by Alertmanager; the version in the
ALERTS
metric is unaffected.
(This can be slightly annoying if you're building Grafana dashboards that display alert information using labels that your alert relabeling changes.)
PS: In practice, people who use Prometheus work out where alert labels come from almost immediately. It's both intuitive (alert rules use expressions, expression results have labels, and so on) and obvious once you have some actual alerts to look at. But if you're trying to decode Prometheus on your first attempt, it and the consequences aren't obvious.
|
|