What Prometheus Alertmanager's group_interval
setting means
One of the configuration settings in Prometheus Alertmanager for
'routes' is the alert group interval, the 'group_interval
'
setting. The Alertmanager configuration describes
the setting this way:
How long to wait before sending a notification about new alerts that are added to a group of alerts for which an initial notification has already been sent.
As has come up before more than
once, this is not actually accurate. The group interval is not a
(minimum) delay; it is instead a timer that ticks every so often
(a ticker). If you have group_interval
set to five minutes,
Alertmanager will potentially send another notification only at
every five minute interval after the first notification (what I'll
call a tick). If the initial notification happened at 12:10, the
first re-notification might happen at 12:15, and then at 12:20, and
then at 12:25, and so on.
(The timing of these ticks is based purely on when the first notification for an alert group is sent, so usually they will not be so neatly lined up with the clock.)
If a new alert (or a resolved alert) misses the group_interval
tick by even a second, a notification including it won't go out
until the next tick. If the initial alert group notification happened
at 12:10 and then nothing changed until a new alert was raised at
12:31, Alertmanager will not send another notification until the
group_interval
tick at 12:35, even though it's been much more
than five minutes since the last notification.
This gives you an unfortunate tradeoff between prompt notification
of additional alerts in an alert group (or of alerts being resolved)
and not receiving a horde of notifications. If you want to receive
a prompt notification, you need a short group_interval
, but
then you can receive a stream of notifications as alert after alert
after alert pops up one by one. It would be nicer if Alertmanager
didn't have this group_interval
tick behavior but would instead
treat it as a minimum delay between successive notifications, but I
don't expect Alertmanager to change at this point.
(I've written all of this down before in various entries, so this
is mostly to have a single entry I can link to in the future when
group_interval
comes up.)
|
|