My life has been improved by my quiet Prometheus alert status monitor

November 28, 2024

I recently created a setup to provide a backup for our email-based Prometheus alerts; the basic result is that if our current Prometheus alerts change, a window with a brief summary of current alerts will appear out of the way on my (X) desktop. Our alerts are delivered through email, and when I set up this system I imagined it as a backup, in case email delivery had problems that stopped me from seeing alerts. I didn't entirely realize that in the process, I'd created a simple, terse alert status monitor and summary display.

(This wasn't entirely a given. I could have done something more clever when the status of alerts changed, like only displaying new alerts or alerts that had been resolved. Redisplaying everything was just the easiest approach that minimized maintaining and checking state.)

After using my new setup for several days, I've ended up feeling that I'm more aware of our general status on an ongoing and global basis than I was before. Being more on top of things this way is a reassuring feeling in general. I know I'm not going to accidentally miss something or overlook something that's still ongoing, and I actually get early warning of situations before they trigger actual emails. To put it in trendy jargon, I feel like I have more situational awareness. At the same time this is a passive and unintrusive thing that I don't have to pay attention to if I'm busy (or pay much attention to in general, because it's easy to scan).

Part of this comes from how my new setup doesn't require me to do anything or remember to check anything, but does just enough to catch my eye if the alert situation is changing. Part of this comes from how it puts information about all current alerts into one spot, in a terse form that's easy to scan in the usual case. We have Grafana dashboards that present the same information (and a lot more), but it's more spread out (partly because I was able to do some relatively complex transformations and summarizations in my code).

My primary source for real alerts is still our email messages about alerts, which have gone through additional Alertmanager processing and which carry much more information than is in my terse monitor (in several ways, including explicitly noting resolved alerts). But our email is in a sense optimized for notification, not for giving me a clear picture of the current status, especially since we normally group alert notifications on a per-host basis.

(This is part of what makes having this status monitor nice; it's an alternate view of alerts from the email message view.)

Written on 28 November 2024.
« Some notes on my experiences with Python type hints and mypy
Python type hints are probably "worth it" in the large for me »

Page tools: View Source.
Search:
Login: Password:

Last modified: Thu Nov 28 23:48:48 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.