Why we scrape Prometheus Blackbox's metrics endpoint

The Prometheus Blackbox exporter is how you do many external checks on machines and services ('endpoints' in Blackbox's jargon), ranging from ping checks up through making HTTPS requests and checking the results. The Blackbox exporter has a somewhat confusing usage; unlike most exporters, you don't so much scrape it as scrape things through it, using probes against targets. As part of this, each combination of probe and target is a separate Prometheus scrape, each of which generates an 'up' metric for that particular scrape. Unlike regular Prometheus exporters, these per-scrape 'up' metrics aren't all that useful because all they tell you is that your Prometheus server could talk to that Blackbox exporter. Actual success or failure of your check is communicated through the 'probe_success' metric, which will be 0 if it failed for some reason.

The Blackbox exporter also has its own /metrics endpoint that gives you metrics for Blackbox itself, which are a combination of general Go and Prometheus exporter metrics with some Blackbox specific ones. One of the reasons to monitor this metrics endpoint is that it will tell you if Blackbox has been unable to successfully reload its configuration for a while, which is something that saved us with the main Prometheus daemon. However, another reason that we monitor the Blackbox metrics endpoint is that scraping Blackbox's own metrics gives us a simple check of whether or not it's up, with its own 'up' metric that's convenient to alert on.

Of course, you can use the 'up' metrics you get from scraping targets through Blackbox, but if you do you have some decisions to make. Do you pick a single probe and target combination that you expect to always be present in your configuration and alert if its 'up' is 0? Do you alert if a sufficient number or percentage of 'up' metrics for Blackbox probes go to zero? If you're using more than one Blackbox exporter for whatever reason, do you have labels set that will tell your alerting rule what Blackbox exporter was used for a particular scrape?

(It turns out that our Blackbox label rewriting doesn't pass through this information. It's not normally important, which is probably why the stock example doesn't preserve it, but it becomes potentially quite relevant if you're using the 'up' metrics from Blackbox checks as a health check on Blackbox itself.)

Simply adding a separate scrape of the Blackbox /metrics endpoint is the simple way out. It gives you a scrape that doesn't depend on what things you're checking through Blackbox, the scrape will definitely have labels that tell you what Blackbox you're talking to, and the extra Blackbox health metrics are potentially useful.

