Alerts should be actionable (and the three sorts of 'alerts')
One of my pet peeves with alerting systems which I've touched on before is bad alerts, or more exactly a specific sort of bad alerts. It's my very strong opinion that all of your alerts should be actionable.
In fact, let's split alerts up into three categories:
- alerts that your sysadmins can and should take immediate action on;
these are actionable alerts.
There is something to do right away in response to them.
- alerts where the sysadmins need to think about and plan out what
they'll do in response to the issue. These are developing situations
that need considered responses, not red alerts that need to be
dealt with immediately. Steadily shrinking disk space is one
- alerts that the sysadmins can't do anything about either immediately or in the future.
(I'm using a broad view of 'alert' here. Alerts may send email or page your phone, but they may also turn an indicator red on your dashboard. Broadly, an alert is anything that is hopping up and down going 'pay attention to me!')
Partly because people seem to like alerting on everything at moves, a lot of alerting systems seem to start with most of their alerts being the third sort. This is bad for various reasons, including that it trains people to ignore alerts because there is too much noise.
My strong view is that you should never create an alert without asking yourself what people are going to do about the alert. If you can't answer the question or the answer is 'well, nothing', what you have is probably the third sort of alert and you should not generate it at all.
(Sometimes there are cases where you know there is a bad problem and somebody should do something but you don't know who and what. If you hit one of these while creating alerts, now is the time to figure out the answers. This may well require management decisions or approval.)
Okay, honesty compels me to add a fourth type of alerts: alerts that you can't do anything about but that you're forced to generate for political reasons, often so that when the alert triggers you can say with a straight face that you knew about the situation and were doing your best to deal with it (when the best is often 'we can't really do anything at all'). I suspect that in some organizations a lot of the alerts are like this.