What you're looking for with a Grafana dashboard affects its settings

August 20, 2020

Recently I wrote about how we chose our time intervals in dashboards, where the answer is that we mostly use $__interval because for our purposes this is the best option. But this raises the question of what is our purpose with our dashboards. Put another way, why do we not care about seeing brief spikes in our dashboards?

Broadly speaking, I think that dashboards can be there to look for signs of obvious issues, to look for signs of subtle issues, or to diagnose problems in detail (when you already know there's an issue and you're trying to understand what's going on). Pretty much all of our dashboards are for some combination of the first or the last, and we don't normally go looking for subtle issues.

(The flipside of looking for signs of obvious issues is reassuring you that there are no obvious issues right now. From a cynical perspective, this may be the purpose of a lot of overview dashboards.)

When you're looking for obvious issues, broad overviews are generally fine. If you have periodic very short usage spikes but nothing else notices on a larger scale, you almost certainly don't have an obvious issue. Similarly, showing very short usage spikes on a broad overview graph isn't necessarily useful unless you believe that these spikes are the sign of a larger issue. As a result, you might as well use $__interval even though it makes short term spikes disappear when you're looking at longer time periods.

When you're trying to diagnose problems in detail you already know something is going on and you're probably looking at fine time scales around specific times of interest. At fine time scales, a properly set up Grafana dashboard will show you all of the information available, including fine grained spikes, because it's using a very short $__interval since it covers only a small time range. This is certainly my experience with our dashboards, where I often wind up looking at only five or ten minute time windows in order to try to really understand what was going on at some point.

Looking for subtle issues is an interesting challenge in dashboard design. I suspect it's hard to do without knowing a fair bit about how your environment is supposed to behave (or at least believing that you do). At this point it's not something that I'm doing very much of in our dashboard design (although I've sort of done some of it).

(See also the problem of paying too much attention to our dashboards.)

Written on 20 August 2020.
« Potential problem points for Chrome (or any browser) to support Linux
When I stopped believing in Google's fundamental good nature »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Aug 20 23:57:28 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.