Three ways to expose script-created metrics in Prometheus

January 4, 2020

In our Prometheus environment, we've wound up wanting (and creating) a bunch of custom metrics that are most naturally created through a variety of scripts. Some of these are general things that are simply not implemented to our tastes in existing scripts and exporters, such as SMART disk metrics, and some of these are completely custom metrics for our environment, such as our per-user, per-filesystem disk space usage information or information from our machine room temperature sensors (which come from an assortment of vendors and have an assortment of ways of extracting information from them). When you're generating metrics in scripts, you need to figure out how to get these metrics from the script into Prometheus. I know of three different ways to do this, and we've used all three.

The first and most obvious way is to have the script publish the metrics to Pushgateway. This requires very little from the host that the script is running on; it has to be able to talk to your Pushgateway host and it needs a HTTP client like curl or wget. This makes Pushgateway publication the easiest approach when you're running as little as possible on the script host. It has various drawbacks that can be boiled down to 'you're using Pushgateway', such as you having to manually check for metrics going stale because the script that generates them is now failing.

On servers where you're running node_exporter, the Prometheus host agent, the simplest approach is usually to have scripts expose their metrics through the textfile collector, where they write a text file of metrics into a particular directory. We wrote a general wrapper script to support this, which handles locking, writing the script's output to a temporary file, and so on, so that our metrics generation scripts only have to write everything to standard output and exit with a success status.

(If a script fails, our wrapper script removes that particular metrics text file to make the metrics go stale. Now that I'm writing this entry, I've realized that we should also write a script status metric for the script's exit code, so we can track and alert on that.)

Both of these methods generally run the scripts through cron, which generally means that you generate metrics at most once a minute and they'll be generated at the start of any minute that the scripts run on. If you scrape your Pushgateway and your host agents frequently, Prometheus will see updated metrics pretty soon after they're generated (a typical host agent scrape interval is 15 seconds).

The final way we expose metrics from scripts is through the third party script_exporter daemon. To quote its Github summary, it's a 'Prometheus exporter to execute scripts and collect metrics from the output or the exit status'. Essentially it's like the Blackbox exporter, except instead of a limited and hard-coded set of probes you have a whole collection of scripts generating whatever metrics that you want to write and configure. The script exporter lets these scripts take parameters, for example to select what target to work on (how this works is up to each script to decide).

Unlike the other two methods, which are mostly configured on the machines running the scripts, generating metrics through the script exporter has to be set up in Prometheus by configuring a scrape configuration for it with appropriate targets defined (just like for Blackbox probes). This has various advantages that I'm going to leave for another entry.

Because you have to set up an additional daemon for the script exporter, I think it works best for scripts that you don't want to run on multiple hosts (they can target multiple hosts, though). In this it's much like the Blackbox exporter; you normally run one Blackbox exporter and use it to check on everything (or a few of them if you need to check from multiple vantage points or check things that are only reachable from some hosts). You certainly could run a script exporter on each machine and doing so has some advantages over the other two ways, but it's likely to be more work compared to using the textfile collector or publishing to Pushgateway.

(It also has a different set of security issues, since the script exporter has to be exposed to scraping from at least your Prometheus servers. The other two approaches don't take outside input in any way; the script exporter minimally allows the outside to trigger specific scripts.)

PS: All of these methods assume that your metrics are 'the state of things right now' style metrics, where it's harmless and desired to overwrite old data with new data. If you need to accumulate metrics over time that are generated by scripts, see using the statsd exporter to let scripts update metrics.

Comments on this page:

By gmuslera at 2020-01-05 15:01:09:

Have you tought on using InfluxData's Telegraf as Prometheus exporter? Or just pushing the data when its collected to any of the supported databases, that can be queried by Grafana?

It may be less expensive in resources than launching a cron job and could collect a lot of different kinds of local and remote metrics.

By cks at 2020-01-05 15:15:47:

I don't think very many (or any) of the metrics we're gathering through scripts could be gathered through Telegraf. A lot of them are quite custom local metrics that require things like parsing the output of status reporting programs for Exim, SLURM, and OpenBSD's various VPNs. I believe that all of them are pretty lightweight, and when they aren't the most expensive portion is the external programs they have to run to get the information.

(Some of the 'scripts' are compiled programs themselves, written in Go.)

Written on 04 January 2020.
« How job control made the SIGCHLD signal useful for (BSD) Unix
Why I prefer the script exporter for exposing script metrics to Prometheus »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jan 4 00:49:25 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.