Wandering Thoughts archives

2024-11-11

Prometheus makes it annoyingly difficult to add more information to alerts

Suppose, not so hypothetically, that you have a special Prometheus meta-alert about large scale issues, that exists to avoid drowning you in alerts about individual hosts or whatever when you have a large scale issue. As part of that alert's notification message, you'd like to include some additional information about things like why you triggered the alert, how many down things you detected, and so on.

While Alertmanager creates the actual notification messages by expanding (Go) templates, it doesn't have direct access to Prometheus or any other source of external information, for relatively straightforward reasons. Instead, you need to pass any additional information from Prometheus to Alertmanager in the form (generally) of alert annotations. Alert annotations (and alert labels) also go through template expansion, and in the templates for alert annotations, you can directly make Prometheus queries with the query function. So on the surface this looks relatively simple, although you're going to want to look carefully at YAML string quoting.

I did some brief experimentation with this today, and it was enough to convince me that there are some issues with doing this in practice. The first issue is that of quoting. Realistic PromQL queries often use " quotes because they involve label values, and the query you're doing has to be a (Go) template string, which probably means using Go raw quotes unless you're unlucky enough to need ` characters, and then there's YAML string quoting. At a minimum this is likely to be verbose.

A somewhat bigger problem is that straightforward use of Prometheus template expansion (using a simple pipeline) is generally going to complain in the error log if your query provides no results. If you're doing the query to generate a value, there are some standard PromQL hacks to get around this. If you want to find a label, I think you need to use a more complex template with operation; on the positive side, this may let you format a message fragment with multiple labels and even the value.

More broadly, if you want to pass multiple pieces of information from a single query into Alertmanager (for example, the query value and some labels), you have a collection of less than ideal approaches. If you create multiple annotations, one for each piece of information, you give your Alertmanager templates the maximum freedom but you have to repeat the query and its handling several times. If you create a text fragment with all of the information that Alertmanager will merely insert somewhere, you basically split writing your alerts between Alertmanager and Prometheus alert rules, And if you encode multiple pieces of information into a single annotation with some scheme, you can use one query in Prometheus and not lock yourself into how the Alertmanager template will use the information, but your Alertmanager template will have to parse that information out again with Go template functions.

What all of this is a symptom of is that there's no particularly good way to pass structured information between Prometheus and Alertmanager. Prometheus has structured information (in the form of query results) and your Alertmanager template would like to use it, but today you have to smuggle that through unstructured text. It would be nice if there was a better way.

(Prometheus doesn't quite pass through structured information from a single query, the alert rule query, but it does make all of the labels and annotations available to Alertmanager. You could imagine a version where this could be done recursively, so some annotations could themselves have labels and etc.)

PrometheusMoreAlertInfoIrritation written at 22:58:46;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.