On (not) logging calculated statistics

July 12, 2010

The more I look at the statistics logged by various systems and programs, the more I've come to a conclusion: logging calculated stats as well as the raw stats is almost always a waste of time. Log analysis programs can just as well change units, compute (nominal) average per time interval, and so on, and in the mean time logging both just clutters up my logs (and sometimes subtracts clarity, often when it is not obvious that it is a calculated stat instead of a real one).

This is not a completely hard rule. Sometimes the calculated stat is both immediately useful enough for people glancing at the unprocessed logs and hard enough to work out by hand in one's head that calculating and logging it is warranted. But my gut feeling is that these cases are pretty rare.

(If your system logs only the calculated stats and doesn't record the raw information, especially if it aggregated the raw information together, you're probably annoying me. Hiding the raw data just makes it harder for me to diagnose problems that you didn't think of, where I really want the unprocessed information so that I can try to extract as much from it as possible.)

This doesn't apply to programs that just present data instead of logging it; for this sort of thing you want the information to be in as friendly a format as possible, so turning unwieldy raw stats into nice friendly calculated ones is a good thing. But watch out. Sometimes the only way of getting useful data to log is to capture the output of that 'friendly' presentation program, and then sysadmins are going to want the stats in as close to a raw format as possible.

(Note that one problem with friendly calculated stats is that the same formatting that makes them attractive to humans makes them harder for programs to parse. As a person, I like seeing things like '10 GB'; as a programmer who now has to parse that field back to some value so I can sort it or compare it with other fields, I like it a lot less.)

Written on 12 July 2010.
« When is using SQL the right answer?
Single-level list flattening in Python »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Jul 12 00:21:58 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.