Easy configuration for lots of Prometheus Blackbox checks

November 11, 2018

Suppose, not entirely hypothetically, that you want to do a lot of Prometheus Blackbox checks, and worse, these are all sorts of different checks (not just the same check against a lot of different hosts). Since the only way to specify a lot of Blackbox check parameters is with different Blackbox modules, this means that you need a bunch of different Blackbox modules. The examples of configuring Prometheus Blackbox probes that you'll find online all set the Blackbox module as part of the scrape configuration; for example, straight from the Blackbox README, we have this in their example:

- job_name: 'blackbox'
  metrics_path: /probe
    module: [http_2xx]

You can do this for each of the separate modules you need to use, but that means many separate scrape configurations and for each separate scrape configuration you're going to need those standard seven lines of relabeling configuration. This is annoying and verbose, and it doesn't take too many of these before your Prometheus configuration file is so overgrown with many Blackbox scrapes that it's hard to see anything else.

(It would be great if Prometheus could somehow macro-ize these or include them from a separate file or otherwise avoid repeating everything for each scrape configuration, but so far, no such luck. You can't even move some of your scrape configurations into a separate included file; they all have to go in the main prometheus.yml.)

Fortunately, with some cleverness in our relabeling configuration we can actually embed the name of the module we want to use into our Blackbox target specification, letting us use one Blackbox scrape configuration for a whole bunch of different modules. The trick is that what's necessary for Blackbox checks is that by the end of setting up a particular scrape, the module parameter is in the __param_module label. Normally it winds up there because we set it in the param section of the scrape configuration, but we can also explicitly put it there through relabeling (just as we set __address__ by hand through relabeling).

So, let's start with nominal declared targets that look like this:

- ssh_banner,somehost:25
- http_2xx,http://somewhere/url

This encodes the Blackbox module before the comma and the actual Blackbox target after it (you can use any suitable separator; I picked comma for how it looks).

Our first job with relabeling is to split this apart into the module and target URL parameters, which are the magic __param_module and __param_target labels:

  - source_labels: [__address__]
    regex: ([^,]*),(.*)
    replacement: $1
    target_label: __param_module
  - source_labels: [__address__]
    regex: ([^,]*),(.*)
    replacement: $2
    target_label: __param_target

(It's a pity that there's no way to do multiple targets and replacements in one rule, or we could make this much more compact. But I'm probably far from the first person to observe that Prometheus relabeling configurations are very verbose. Presumably Prometheus people don't expect you to be doing very much of it.)

Since we're doing all of our Blackbox checks through a single scrape configuration, we won't normally be able to easily tell which module (and thus which check) failed. To make life easier, we explicitly save the Blackbox module as a new label, which I've called probe:

  - source_labels: [__param_module]
    target_label: probe

Now the rest of our relabeling is essentially standard; we save the Blackbox target as the instance label and set the actual address of our Blackbox exporter:

  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__

All of this works fine, but there turns out to be one drawback of putting all or a lot of your blackbox checks in a single scrape configuration, which is that you can't set the Blackbox check interval on a per-target or per-module basis. If you need or want to vary the check interval for different checks (ie, different Blackbox modules) or even different targets, you'll need to use separate scrape configurations, even with all of the extra verbosity that that requires.

(As you might suspect, I've decided that I'm mostly fine with a lot of our Blackbox checks having the same frequency. I did pull ICMP ping checks out into a separate scrape configuration so that we can do them a lot more frequently.)

PS: If you wanted to, you could go further than this in relabeling; for instance, you could automatically add the :25 port specification on the end of hostnames for SSH banner checks. But it's my view that there's a relatively low limit on how much of this sort of rewriting one should do. Rewriting to avoid having a massive prometheus.yml is within my comfort limit here; rewriting just avoid putting a ':25' on hostnames is not. There is real merit to being straightforward and sticking as close to normal Prometheus practice as possible, without extra magic.

(I think that the 'module,real-target' format of target names I've adopted here is relatively easy to see and understand even if you don't know how it works, but I'm biased and may be wrong.)

Written on 11 November 2018.
« The needs of Version Control Systems conflict with capturing all metadata
What Python 3 versions I can use (November 2018 edition) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 11 22:35:04 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.