I wish Prometheus had a table-driven label remapping feature

August 19, 2022

We operate a variety of websites and services that are known to users under general names (such as the website 'support.cs') but which are implemented by specific, known machines, such as our primary web server. When we do Blackbox external checks on these services, we have to do it under their general name, and by default this generic name will flow through to our "host" label. In turn, this means that if something happens to a machine (such as its Apache stopping responding), by default we'll get a number of alerts about different nominal hosts.

The general Prometheus solution to this is relabeling. You can do this either as the metrics are ingested from Blackbox probes or as the alerts are sent to Alertmanager. However, right now doing this in bulk is awkward. If you have a bunch of services that are implemented by a bunch of different machines, what you wind up with is a bunch of relabeling rules that look like:

- source-labels: ["host"]
  regex: "(name1|othername|virtual2)"
  replacement: realhost1
  target_label: host

You have to have one of these for each real host with its list of services.

What I wish for from Prometheus is some way to do a table based lookup for label remapping, where you could list a bunch of source matches and target values. Then we could handle all of these service name to real host remappings in one place, with one rewrite rule and ideally one table in a file.

(In our case, canonicalization by reverse lookup of the target IP isn't sufficient in all cases, because some services are deliberately offered on IP aliases of the relevant hosts.)

Sadly, I suspect that this is a sufficiently obscure or unpopular usage that Prometheus isn't likely to support it. There's also no obvious syntax or small feature addition that could do it, especially if you want to use a file for the mapping table. YAML does have a syntax for maps (aka dictionaries), so you could at least write an inline regex_map YAML map that had a bunch of regexs as the keys and then replacements as the values, but that doesn't fit nicely in with how the replacement attribute is defined.

PS: If you have to do this manually, with a bunch of specific relabeling rules, I think it's more maintainable to do this in alert relabeling. Otherwise you may have to replicate all of your relabeling across different scrape jobes, for example if you use both Blackbox and a script exporter, as we do.

Sidebar: The per-target label approach is worse

The alternate approach is to define labels with targets. Unfortunately the resulting YAML is relatively terrible, at least in my view. You can't just define some labels on a per-target basis; instead, you have to break things up into awkward blocks of

- labels:
    host: realhost1
  targets:
  - http://something1/
  - something2

- labels:
    host: realhost2
  targets:
  - https://service/
  - aservice:2048

Do you want all of your HTTP and HTTPS checks in one place so you can keep track of all of them? With applying labels to targets in the configuration, you're out of luck.

Life gets worse if you're using both Blackbox and a script exporter to probe hosts, because you get to repeat these mappings in two files. Or more, if you have other parameters you want to vary.

Written on 19 August 2022.
« Rasdaemon is what you want on Linux if you're seeing kernel MCE messages
The Ubuntu 22.04 server installer wants you to scrub reused disks first »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Aug 19 22:47:18 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.