There are two different purposes of monitoring systems

June 11, 2009

It's worth saying this explicitly: monitoring systems have two different purposes, one of which is sort of a subset of the other (but not necessarily).

The first purpose of monitoring, and what many people initially install a system for, is alerting, letting you when there are problems. The second purpose of monitoring is tracking, gathering ongoing data for historical analysis; this is part of the vital work of getting statistics. Put this way, it's clear that these two overlap (sometimes badly); it is useful to track what you alert on (even if it is just whether or not there was an alert), and it is all too common to alert on everything that you track.

It's tempting to say that alerting is a subset of tracking, but I maintain that this is a mistake. Alerting with history needs fundamentally different features than just tracking with telling people when the value of the tracked object is out of range; for example, if you take alerting seriously you want to have some way of sending alerts only once.

(And this is the tip of the iceberg. Alerting is difficult to do well. To be fair, so is tracking.)

It follows that when you decide to monitor something, you should decide why you're monitoring it; do you want to track it, to alert on it, or both? Not everything makes sense to alert on, and not everything makes sense to track in detail.

Written on 11 June 2009.
« Users are lazy
What I know about Solaris 10 NFS server file lock limits »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jun 11 00:13:15 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.