Wandering Thoughts archives


Using Prometheus's statsd exporter to let scripts make metrics updates

One of the things that's very useful about Prometheus is that it's pretty easy to write little ad-hoc scripts or programs that generate metrics and then publish them. You can use either the host agent's 'textfile' collector, which is a good fit for host-specific metrics on a host where you're already running the host agent, or you can have your program publish them through Pushgateway, including by just having your script pipe its output to an appropriate curl command. However, there is one situation that this doesn't cover, and that is when your scripts want to update a metric instead of generate it from scratch. For example, if you have a cron job that runs periodically to processes some variable number of things and you want a running count of how many things you have processed (instead of a gauge of how many have been processed on the last job). To use jargon, your script or program has stateless observations (eg 'I processed three things this time') and you want to convert them into ongoing metrics, which are necessarily stateful.

In short, you want to be able to update metrics, not just create or re-create them from scratch. Ideally you want these updates to be more or less atomic, so that you don't have to worry about 'read modify write' races if you have several instances of your script or program running at once, all trying to make an update.

The good news is that the Prometheus statsd exporter can do this for you, and it is actually very convenient to use. The statsd protocol itself is focused around exactly this sort of incremental updates to metrics and the statsd exporter will turn those into Prometheus metrics for us, and the protocol itself is text-based so we don't need any special client (especially since the Prometheus exporter speaks a TCP based version). For extra convenience, the Prometheus statsd exporter supports an extended statsd format with tags (also) that will let us directly attach labels and label text, rather than having to configure the statsd exporter to turn some portions of the statsd metric names into Prometheus label values.

The basic use is pretty straightforward. With the statsd exporter running on localhost, you can do:

echo 'our.counter:3|c|#lbl1:val1,lbl2:val2' | nc localhost 9125
echo 'our.counter:2|c|#lbl1:val1,lbl2:val2' | nc localhost 9125

This will create or update a Prometheus counter metric with the .'s turned into Prometheus '_'s and the labels we asked for:

our_counter {lbl1="val1", lbl2="val2"} 5

(You can also use '+3' instead of plain '3' to make things more obvious.)

For gauges, there is a magic trick, which is that '+<N>' increases the gauge and '-<N>' decreases it, while a plain number just sets the value:

echo 'our.gauge:3|g|#label:val' | nc localhost 9125
echo 'our.gauge:-5|g|#label:val' | nc localhost 9125
echo 'our.gauge:+4|g|#label:val' | nc localhost 9125

The result is a gauge metric:

our_gauge {label="val"} 2

As you can see, gauges can go negative. As is the Prometheus practice, counters can never decrease; the statsd exporter will reject attempts to do so (ie, statsd updates with negative values). These rejections are normally silent, but you can get the exporter to report them at log level 'debug'.

(Since there's no easy way to change the type of a metric after it's created, you want to be a bit careful about what you make something. If you send in a statsd metric with the wrong type, it's rejected.)

The Prometheus statsd exporter can also generate quantiles and histograms from raw observations, which statsd generally calls 'timers'. Due to the statsd protocol, your numbers are assumed to be in milliseconds and the exporter divides the value by a thousand to create seconds-based metrics, as is the usual Prometheus custom. You'll have to scale your numbers appropriately if you don't actually have milliseconds. As covered in the exporter's documentation on statsd timers, the default result without any configuration is a quantile with 0.5, 0.9, and 0.99; currently these have acceptable errors settings of 0.05, 0.01, and 0.001 respectively, although that's not documented and might change. Anything else requires some degree of configuration of the statsd exporter.

(As far as I can tell, you don't need any statsd exporter configuration here unless you either want some histograms or you want to change the quantiles. The statsd exporter supports a TTL for metrics, where they go away if they haven't been updated in long enough, but the not entirely documented default is that there is no TTL and all metrics live forever, as with Pushgateway. See the section on configuring global defaults.)

An example of this is:

echo 'our.summary:500|h|#label:val' | nc localhost 9125
echo 'our.summary:200|h|#label:val' | nc localhost 9125
echo 'our.summary:50|h|#label:val' | nc localhost 9125

With the default configuration, this results in the following Prometheus metrics:

our_summary {label="val", quantile="0.5"} 0.2
our_summary {label="val", quantile="0.9"} 0.5
our_summary {label="val", quantile="0.99"} 0.5
our_summary_sum {label="val"} 0.75
our_summary_count {label="val"} 3

According to the documentation, you can use any of the 'ms', 'h', and 'd' statsd metric types for this. If I was doing this with times, I would probably use 'ms' to try to remind myself that the raw numbers I output had to be in milliseconds instead of seconds. Otherwise I would probably use 'h' and put a comment in the script about why I was multiplying everything by 1000.

So far I'm assuming that we'll use the statsd exporter purely for this approach for updating Prometheus metrics. If I wanted to both import genuine statsd metrics into Prometheus and use the statsd exporter as a way for scripts to update Prometheus metrics, I think I'd run two instances and configure them independently. You could probably mix the two uses in one instance, but keeping them separate just seems simpler and more straightforward.

(This is where I admit that I haven't actually used the statsd exporter for real yet, since I just discovered this today (also). But I think we have some things that would benefit from this, and so I'm tempted to start running the statsd exporter even with no metrics so that it's easy to add metrics updates to random scripts and programs as I touch them.)

sysadmin/PrometheusStatsdForMetricsUpdates written at 22:42:37; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.