Adding a "host" label to all of your per-host Prometheus metrics
One of the things I've come to believe about labels on Prometheus
metrics is that all metrics for a particular host should have a
label for its hostname. I tend to call this label "host" (eg) in my entries, but when we set up
our setup I actually called it
cshost", with a prefix, to guard against the possibility that
some metrics source would have its own "host" label.
The purpose of this label is to have something that can be used to
straightforwardly group and join across different metrics sources
for the host. Often this will make it convenient to reuse it in
alert messages. Prometheus will generally give each metrics source
its own unique combination of "
job" and "
instance" labels, but
instance" label often has inconvenient extras in it, like
port numbers. Taking all of those extra things out and creating a
unique label for each host makes it much easier to do various things
across all metrics for a host, regardless of their source.
(As part of this, if you send host specific things to Pushgateway from a host, you should make sure it also adds the host label to what it sends in one way or another. What is host specific may depend on what use you want to make of the metrics you push.)
If you automatically generate your list of targets, you can probably
just specify the value for your "host" label along side each generated
target. Otherwise, you'll want to use relabeling
to create these labels from information you already have. For
example, here is our relabeling rule for our host agent job,
which just takes off the port number on the address to create
relabel_configs: - source_labels: [__address__] regex: (.*):9100 replacement: $1 target_label: cshost
(We also use relabeling for more complicated things, although perhaps we should use another, more Prometheus-like approach.)
When you set this up, I have a small suggestion from our somewhat
painful experience: don't mix fully qualified and unqualified
host names in your "host" labels for the same machines. Our agent
jobs (for node_exporter
and some other per-host agents run on specific hosts) use unqualified
host names, but all of our Blackbox checks use fully
qualified host names; this difference is then passed through to our
cshost" label values. We fix this up in Alertmanager relabeling so that Alertmanager always sees an
cshost" (for our own hosts) and uses this in alert
messages and grouping, but we should have this right from the start
in the metrics themselves.
(The morally right choice is probably to use fully qualified host names everywhere, even if this makes life more annoying.)