Alerts should be actionable (and the three sorts of 'alerts')

December 16, 2012

One of my pet peeves with alerting systems which I've touched on before is bad alerts, or more exactly a specific sort of bad alerts. It's my very strong opinion that all of your alerts should be actionable.

In fact, let's split alerts up into three categories:

  • alerts that your sysadmins can and should take immediate action on; these are actionable alerts. There is something to do right away in response to them.

  • alerts where the sysadmins need to think about and plan out what they'll do in response to the issue. These are developing situations that need considered responses, not red alerts that need to be dealt with immediately. Steadily shrinking disk space is one classical example.

  • alerts that the sysadmins can't do anything about either immediately or in the future.

(I'm using a broad view of 'alert' here. Alerts may send email or page your phone, but they may also turn an indicator red on your dashboard. Broadly, an alert is anything that is hopping up and down going 'pay attention to me!')

Partly because people seem to like alerting on everything at moves, a lot of alerting systems seem to start with most of their alerts being the third sort. This is bad for various reasons, including that it trains people to ignore alerts because there is too much noise.

My strong view is that you should never create an alert without asking yourself what people are going to do about the alert. If you can't answer the question or the answer is 'well, nothing', what you have is probably the third sort of alert and you should not generate it at all.

(Sometimes there are cases where you know there is a bad problem and somebody should do something but you don't know who and what. If you hit one of these while creating alerts, now is the time to figure out the answers. This may well require management decisions or approval.)

Okay, honesty compels me to add a fourth type of alerts: alerts that you can't do anything about but that you're forced to generate for political reasons, often so that when the alert triggers you can say with a straight face that you knew about the situation and were doing your best to deal with it (when the best is often 'we can't really do anything at all'). I suspect that in some organizations a lot of the alerts are like this.

Written on 16 December 2012.
« A few small notes about OpenBSD PF (as of 4.4 and 5.1)
Should you alert on the glaringly obvious? »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Dec 16 00:19:20 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.