Alerts should be actionable (and the three sorts of 'alerts')

December 16, 2012

One of my pet peeves with alerting systems which I've touched on before is bad alerts, or more exactly a specific sort of bad alerts. It's my very strong opinion that all of your alerts should be actionable.

In fact, let's split alerts up into three categories:

  • alerts that your sysadmins can and should take immediate action on; these are actionable alerts. There is something to do right away in response to them.

  • alerts where the sysadmins need to think about and plan out what they'll do in response to the issue. These are developing situations that need considered responses, not red alerts that need to be dealt with immediately. Steadily shrinking disk space is one classical example.

  • alerts that the sysadmins can't do anything about either immediately or in the future.

(I'm using a broad view of 'alert' here. Alerts may send email or page your phone, but they may also turn an indicator red on your dashboard. Broadly, an alert is anything that is hopping up and down going 'pay attention to me!')

Partly because people seem to like alerting on everything at moves, a lot of alerting systems seem to start with most of their alerts being the third sort. This is bad for various reasons, including that it trains people to ignore alerts because there is too much noise.

My strong view is that you should never create an alert without asking yourself what people are going to do about the alert. If you can't answer the question or the answer is 'well, nothing', what you have is probably the third sort of alert and you should not generate it at all.

(Sometimes there are cases where you know there is a bad problem and somebody should do something but you don't know who and what. If you hit one of these while creating alerts, now is the time to figure out the answers. This may well require management decisions or approval.)

Okay, honesty compels me to add a fourth type of alerts: alerts that you can't do anything about but that you're forced to generate for political reasons, often so that when the alert triggers you can say with a straight face that you knew about the situation and were doing your best to deal with it (when the best is often 'we can't really do anything at all'). I suspect that in some organizations a lot of the alerts are like this.

Comments on this page:

From at 2012-12-16 17:11:04:

Darn valid point, your three types of alerts. If one only has two first types, I call such a situation a "green screen", where every possible alert is either acknowledged (somebody is working on the problem) or is not there at all.

I used to think of this strategy as "monitor as little as you can", but it turned out that people mean two different things as "monitoring": detecting events (this generates alerts) and collecting data for trends (you've called them "metrics" in some other posts; quite nice name). My statement "monitor as little as you can" is only valid for detecting events.

From at 2012-12-17 07:13:50:

Actually I have (a very few) examples of a variant of your fourth category, namely alerts you can't do anything about but you have to be able to tell from actionable alerts.

Specifically, we used to have a team which ran its own servers. The team was dissolved a few months ago and we inherited the remaining servers. One of them runs a known-broken app we have to live with until we can get a contractor to fix it. We do get alerts[1] when this app blows up, just so we know the app is down but the server itself (the part we can do something about) still works.

-- Arnaud Gomes

[1] of the "red light in a dashboard" kind -- we have totally dropped mail/pager alerts as we too work in a "working hours only" situation where mail alerts are just noise.

Here is a reasonable-looking rebuttal: How to Square the Circle, Achieve Perpetual Motion, and Tune Your Alert Emails Just Right

tl;dr “those categories are meaningless even though they sound reasonable because they are not actually distinguishable”, then it goes on to suggest that the way to make the effect of a choice distinguishable is to cast it in economic terms.

(One thing not explicit in the article that seems important to me is that the economic model must take into account not only the cost of outcomes of an individual alert, but also its cognitive costs – maybe most easily modelled as a non-linear opportunity cost on attention.)

Written on 16 December 2012.
« A few small notes about OpenBSD PF (as of 4.4 and 5.1)
Should you alert on the glaringly obvious? »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Dec 16 00:19:20 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.