What Prometheus Alertmanager's group_interval setting means

April 2, 2024

One of the configuration settings in Prometheus Alertmanager for 'routes' is the alert group interval, the 'group_interval' setting. The Alertmanager configuration describes the setting this way:

How long to wait before sending a notification about new alerts that are added to a group of alerts for which an initial notification has already been sent.

As has come up before more than once, this is not actually accurate. The group interval is not a (minimum) delay; it is instead a timer that ticks every so often (a ticker). If you have group_interval set to five minutes, Alertmanager will potentially send another notification only at every five minute interval after the first notification (what I'll call a tick). If the initial notification happened at 12:10, the first re-notification might happen at 12:15, and then at 12:20, and then at 12:25, and so on.

(The timing of these ticks is based purely on when the first notification for an alert group is sent, so usually they will not be so neatly lined up with the clock.)

If a new alert (or a resolved alert) misses the group_interval tick by even a second, a notification including it won't go out until the next tick. If the initial alert group notification happened at 12:10 and then nothing changed until a new alert was raised at 12:31, Alertmanager will not send another notification until the group_interval tick at 12:35, even though it's been much more than five minutes since the last notification.

This gives you an unfortunate tradeoff between prompt notification of additional alerts in an alert group (or of alerts being resolved) and not receiving a horde of notifications. If you want to receive a prompt notification, you need a short group_interval, but then you can receive a stream of notifications as alert after alert after alert pops up one by one. It would be nicer if Alertmanager didn't have this group_interval tick behavior but would instead treat it as a minimum delay between successive notifications, but I don't expect Alertmanager to change at this point.

(I've written all of this down before in various entries, so this is mostly to have a single entry I can link to in the future when group_interval comes up.)

Written on 02 April 2024.
« The power of being able to query your servers for unpredictable things
An issue with Alertmanager inhibitions and resolved alerts »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Apr 2 20:43:46 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.