Wandering Thoughts archives

2021-09-04

Adding a "host" label to all of your per-host Prometheus metrics

One of the things I've come to believe about labels on Prometheus metrics is that all metrics for a particular host should have a label for its hostname. I tend to call this label "host" (eg) in my entries, but when we set up our setup I actually called it "cshost", with a prefix, to guard against the possibility that some metrics source would have its own "host" label.

The purpose of this label is to have something that can be used to straightforwardly group and join across different metrics sources for the host. Often this will make it convenient to reuse it in alert messages. Prometheus will generally give each metrics source its own unique combination of "job" and "instance" labels, but the "instance" label often has inconvenient extras in it, like port numbers. Taking all of those extra things out and creating a unique label for each host makes it much easier to do various things across all metrics for a host, regardless of their source.

(As part of this, if you send host specific things to Pushgateway from a host, you should make sure it also adds the host label to what it sends in one way or another. What is host specific may depend on what use you want to make of the metrics you push.)

If you automatically generate your list of targets, you can probably just specify the value for your "host" label along side each generated target. Otherwise, you'll want to use relabeling to create these labels from information you already have. For example, here is our relabeling rule for our host agent job, which just takes off the port number on the address to create our "cshost" label:

relabel_configs:
  - source_labels: [__address__]
    regex: (.*):9100
    replacement: $1
    target_label: cshost

(We also use relabeling for more complicated things, although perhaps we should use another, more Prometheus-like approach.)

When you set this up, I have a small suggestion from our somewhat painful experience: don't mix fully qualified and unqualified host names in your "host" labels for the same machines. Our agent jobs (for node_exporter and some other per-host agents run on specific hosts) use unqualified host names, but all of our Blackbox checks use fully qualified host names; this difference is then passed through to our "cshost" label values. We fix this up in Alertmanager relabeling so that Alertmanager always sees an unqualified "cshost" (for our own hosts) and uses this in alert messages and grouping, but we should have this right from the start in the metrics themselves.

(The morally right choice is probably to use fully qualified host names everywhere, even if this makes life more annoying.)

sysadmin/PrometheusAddHostnameLabel written at 22:56:16; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.