How I try out and test new versions of Grafana

September 3, 2021

For much of the time that we've been running our Prometheus and Grafana setup, upgrading our Grafana to keep up with the latest version has been unexciting. Then Grafana embarked on a steady process of deprecating and eventually removing old panel types, replacing them with new and only theoretically equivalent and fully functional new implementations. I've generally not been a fan, even before Grafana 8.0 (also), and so this created a need to actually try out and test new versions of Grafana so I would have some idea if they made our dashboards explode. Fortunately this is fairly easy in our current environment through some lucky setup choices we made.

(If you're considering such an upgrade and you have old SingleStat panels, you probably want to install the 'grafana-singlestat-panel' plugin, which keeps it available as a plugin. This extremely helpful tip comes from matt abrams on Twitter.)

We run both Prometheus and Grafana on the same server, with both behind an Apache reverse proxy, and we do our alerts through Prometheus Alertmanager, not Grafana. When I first configured Grafana, I could have set it to talk directly to Prometheus as 'localhost:9090', but for various reasons I set it up to go through the Apache reverse proxy using the machine's official web name. One useful effect of this is that I can easily bring up another instance of our Grafana setup on a second machine; if I copy our configuration and Grafana database, it has our normal dashboards and will automatically pull data from our live Prometheus. I can then readily see if everything looks right and directly compare the appearance of our production Grafan and the different version.

(We have standard install instructions for our core metrics server; I do the relevant Apache and Grafana sections on a test virtual machine. If I'm smart, I snapshot the initial post-installation state of the VM before I start playing around with a new Grafana version, so I can revert to a clean setup without reinstalling from scratch.)

With a separate grafana.db I can then experiment with updating panels and dashboards for the new version. If it doesn't work out I can revert to the initial setup by taking another copy of our live grafana.db (and under some circumstances consider copying it the other way to save work). All sorts of experiments are possible.

I could still do a version of this if I had set our Grafana to talk directly to Prometheus (provided that our Prometheus was accessible from outside the machine, which it is); I'd just have to edit the Grafana datasource. I don't think this currently has any other effects on your dashboards, and that's probably not going to change in the future.

If we did alerts through Grafana, I would have to disable them in the test Grafana to avoid potential duplicate alerts. In my view, this is a good reason to have your alerts handled in a separate component that you can selectively disable or omit. Alerts have observable side effects, so you have to be careful when testing them; dashboards generally don't, so you have an easier time.

Written on 03 September 2021.
« Go multi-module workspace mode, a forthcoming feature in Go 1.18
Adding a "host" label to all of your per-host Prometheus metrics »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Sep 3 22:51:35 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.