How (and where) Prometheus alerts get their labels

February 24, 2021

In Prometheus, you can and usually do have alerting rules that evaluate expressions to create alerts. These alerts are usually passed to Alertmanager and they are visible in Prometheus itself as a couple of metrics, ALERTS and ALERTS_FOR_STATE. These metrics can be used to do things like find out the start time of alerts or just display a count of currently active alerts on your dashboard. Alerts almost always have labels (and values for those labels), which tend to be used in Alertmanager templates to provide additional information along side annotations, which are subtly but crucially different.

All of this is standard Prometheus knowledge and is well documented, but what doesn't seem to be well documented is where alert labels come from (or at least I couldn't find it said explicitly in any of the obvious spots in the documentation). Within Prometheus, the labels on an alert come from two places. First, you can explicitly add labels to the alert in the alert rule, which can be used for things like setting up testing alerts. Second, the basic labels for an alert are whatever labels come out of the alert expression. This can have some important consequences.

If your alert expression is a simple one that just involves basic metric operations, for example 'node_load1 > 10.0', then the basic labels on the alert are the same labels that the metric itself has; all of them will be passed through. However, if your alert expression narrows down or throws away some labels, then those labels will be missing from the end result. One of the ways to lose metrics in alert expressions is to use 'by (...)', because this discards all labels other than the 'by (whatever)' label or labels. You can also deliberately pull in labels from additional metrics, perhaps as a form of database lookup (and then you can use these additional labels in your Alertmanager setup).

Prometheus itself also adds an alertname label, with the name of the alert as its value. The ALERTS metric in Prometheus also has an alertstate label, but this is not passed on to the version of the alert that Alertmanager sees. Additionally, as part of sending alerts to Alertmanager, Prometheus can relabel alerts in general to do things like canonicalize some labels. This can be done either for all Alertmanager destinations or only for a particular one, if you have more than one of them set up. This only affects alerts as seen by Alertmanager; the version in the ALERTS metric is unaffected.

(This can be slightly annoying if you're building Grafana dashboards that display alert information using labels that your alert relabeling changes.)

PS: In practice, people who use Prometheus work out where alert labels come from almost immediately. It's both intuitive (alert rules use expressions, expression results have labels, and so on) and obvious once you have some actual alerts to look at. But if you're trying to decode Prometheus on your first attempt, it and the consequences aren't obvious.

Written on 24 February 2021.
« The mhbuild directives I want for sending MIME attachments with MH
How convenience in Prometheus labels for alerts led me into a quiet mistake »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 24 00:19:23 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.