Why selecting times is still useful even for dashboards that are about right now

April 7, 2019

In the aftermath of our power outage, one of the things that I did was put together a Grafana dashboard that was specifically focused on dealing with large scale issues, specifically a lot of machines being down or having problems. In this sort of situation, we don't need to see elaborate status displays and state information; basically we want a list of down machines and a list of other alerts, and very little else to get in the way.

(We have an existing overview dashboard, but it's designed with the tacit assumption that only a few or no machines are down and we want to see a lot of other state information. This is true in our normal situation, but not if we're going through a power shutdown or other large scale event.)

This dashboard will likely only ever be used in production displaying the current time, because 'what is (still) wrong right now' is its entire purpose. Yet when I built it, I found that I not only wanted to leave in the normal Grafana time setting options but specifically build in a panel that would let me easily narrow in on a specific (end) time. This is because setting the time to a specific point is extremely useful for development, testing, and demos of your dashboard. In my case, I could set my in-development dashboard back to a point during our large scale power outage issues and ask myself whether what I was seeing was useful and complete, or whether it was annoying and missing things we'd want to know.

(And also test that the queries and Grafana panel configurations and so on were producing the results that I expected and needed.)

This is obviously especially useful for dashboards that are only interesting in exceptional conditions, conditions that you hopefully don't see all the time and can't find on demand. We don't have large scale issues all that often, so if I want to see and test my dashboard during one before the next issue happens I need to rewind time and set it at a point where the last crisis was happening.

(Now that I've written this down it all feels obvious, but it initially wasn't when I was staring at my dashboard at the current time, showing nothing because nothing was down, and wondering how I was going to test it.)

Sidebar: My best time-selection option in Grafana

In my experience, the best way to select a time range or a time endpoint in Grafana is through a graph panel that shows something over time. What you show doesn't matter, although you might as well try to make it useful; what you really care about is the time scale at the bottom that lets you swipe and drag to pick the end and start points of the time range. The Grafana time selector at the top right is good for the times that it gives fast access to, but it is slow and annoying if you want, say, '8:30 am yesterday'. It is much faster to use the time selector to get your graph so that it includes the time point you care about, then select it off the graph.

Written on 07 April 2019.
« A ZFS resilver can be almost as good as a scrub, but not quite
An example of a situation where Go interfaces can't substitute for generics »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Apr 7 22:45:30 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.