Using Dovecot 2.3's 'events' system to create Prometheus metrics

December 3, 2022

Last time around I covered using Dovecot 2.3's events to generate log messages. This is actually the less interesting thing (to us) that you can do with them; the more interesting thing is that you can have Dovecot directly expose an OpenMetrics exporter for statistics, which Prometheus can scrape directly (the OpenMetrics metrics format is more or less the Prometheus one, and Prometheus can deal with it these days). However, actually generating useful metrics and understanding what you get is a little bit complicated.

(You'll need a service definition to expose your metrics for scraping, per the basic configuration, which you can just copy as is.)

In Prometheus terms, Dovecot statistics can give you counters or histograms (either exponential or linear). Histograms are the simpler thing so I'll cover them first. A histogram is created with a metric with a group_by that sets either an exponential or a linear set of histogram buckets. For example, a duration histogram:

metric imap_command_time {
  filter = event=imap_command_finished AND \
           tagged_reply_state=OK
  group_by = cmd_name user \
             duration:exponential:1:30:2
}

The remaining group_by fields become histogram labels; in other words, we're creating a group of histograms by IMAP command and user (which is potentially a lot of histograms). These histograms are of the command duration, and have thirty buckets starting from 1 microsecond and going up to 1073.7 seconds (17.8 minutes), which should be enough of a range. That the duration is in microseconds is covered in Global Fields, but fortunately, for duration Dovecot will convert this to the standard Prometheus version of seconds for you. These histograms also have the standard Prometheus histogram metrics of *_sum and *_count, which is handy for reasons we'll come back to later.

The other metric Dovecot will create is counters, which Dovecot calls discrete statistics. However, these metrics have a major limitation, which is that they can only be done to count how many times something happened (ie, Dovecot 'events'), not additional data associated with those Dovecot events. These Dovecot statistics create Prometheus metrics for the count itself and for the 'duration' associated with the event. You cannot use these 'discrete' statistics to count, for example, the number of bytes output for IMAP commands; you can only count how many IMAP commands there were, and along with that the sum of their durations. THe group_by clause also behaves peculiarly (from a Prometheus perspective) for counter metrics. So let's start with a counter metric definition and then talk about what happens:

metric imap_command {
  filter = event=imap_command_finished
  group_by = cmd_name user tagged_reply_state
}

This creates two groups of Prometheus metrics, dovecot_imap_command_total (the count of them) and dovecot_imap_command_duration_seconds_total (the total duration in seconds). However, in each you get not just a single set of labels (the way you do with histograms), but a hierarchy, as show in the example in the exporter documentation. Here, that would create a set of labels that look like this:

{cmd_name="LIST"}
{cmd_name="LIST", user="tstuser"}
{cmd_name="LIST", user="tstuser", tagged_reply_state="OK"} 

Prometheus can consume these metrics but the result may be confusing (and voluminous). You may also want to consider carefully the order of group_by, because it will influence which aggregate stats are readily at hand (here, count and duration by command) versus which aren't so easy (count and duration by user).

Although the Dovecot Statistics documentation talks about using the 'fields' setting to specify "a list of fields that are included in the metrics", as of Dovecot 2.3.16 this doesn't actually do anything for Prometheus metrics (although it does for Dovecot's internal statistics that you can access through the 'doveadm' command). It would be nice if it did at some point in the future, because it would allow us to easily obtain Prometheus metrics of, say, the total bytes output by IMAP commands (broken down as above). Instead we have to reach for a hack to generate such a thing.

If you want a Prometheus counter metric of, for example, bytes output by IMAP command and user, then the solution to this limitation of Dovecot 'discrete' statistics is to use the world's smallest linear histogram:

metric imap_command_out_bytes {
  filter = event=imap_command_finished AND \
           tagged_reply_state=OK
  group_by = cmd_name user \
             bytes_out:linear:1:2:1
}

We don't actually care about the histogram itself (and could have Prometheus drop it from the scrape results); what we care about are the associated *_count and *_sum metrics, which will give us the running sum of bytes out and the count of how many commands we've had.

You can similarly use the world's smallest histogram to eliminate the usual cascade from counter metrics. Simply make a histogram metric where the tiny histogram is of duration:

metric imap_command {
  filter = event=imap_command_finished
  group_by = cmd_name user tagged_reply_state \
             duration:linear:1:2:1
}

However, this will probably generate more metrics in total than you would with a regular Dovecot discrete metric, although you can drop all the histogram buckets on ingestion to cut that down.

If you have Dovecot 'metrics' which exist only to log events, you can exclude them from Dovecot's exposed Prometheus metrics by giving them names that are invalid as OpenMetrics metrics, for example by putting one or more '/' in them. Dovecot will complain on startup, but so what.

Written on 03 December 2022.
« Apache 2.4's event MPM and oddities with ServerLimit
How to lose some of your tabs in Firefox 107+ (and possibly earlier) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Dec 3 23:24:34 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.