Notes on Grafana 'value groups' for dashboard variables

March 30, 2020

Suppose, not hypothetically, that you have some sort of Grafana overview dashboard that can show you multiple hosts at once in some way. In many situations, you're going to want to use a Grafana dashboard variable to let you pick some or all of your hosts. If you're getting the data for what hosts should be in your list from Prometheus, often you'll want to use label_values() to extract the data you want. For example, suppose that you have a label field called 'cshost' that is your local short host name for a host. Then a plausible Grafana query for 'all of our hosts' for a dashboard variable would be:

label_values( node_load1, cshost )

(Pretty much every Unix that the Prometheus host agent runs on will supply a load average, although they may not supply other metrics.)

However, if you have a lot of hosts, this list can be overwhelming and also you may have sub-groupings of hosts, such as all SLURM nodes that you want to make it convenient to narrow down to. To support this, Grafana has a dashboard variable feature called value groups or just 'tags'. Value groups are a bit confusing and aren't as well documented as dashboard variables as a whole.

There are two parts to setting up a value group; you need a query that will give Grafana the names of all of the different groups (aka tags), and then a second query that will tell Grafana which hosts are in a particular group. Suppose that we have a metric to designate which classes a particular host is in:

cslab_class{ cshost="cpunode2", class="comps" }    1
cslab_class{ cshost="cpunode2", class="slurmcpu" } 1
cslab_class{ cshost="cpunode2", class="c6220" }    1

We can use this metric for both value group queries. The first query is to get all the tags, which are all the values of class:

label_values( cslab_class, class )

Note that we don't have to de-duplicate the result; Grafana will do that for us (although we could do it ourselves if we wanted to make a slightly more complex query).

The second query is to get all of values for a particular group (or tag), which is to say the hosts for a specific class. In this query, we have a special Grafana provided $tag variable that refers to the current class, so our query is now for the cshost label for things with that class:

label_values( cslab_class{ class="$tag" }, cshost )

It's entirely okay for this query to return some additional hosts (values) that aren't in our actual dashboard variable; Grafana will quietly ignore them for the most part.

Although you'll often want to use the same metric in both queries, it's not required. Both queries can be arbitrary and don't have to be particularly related to each other. Obviously, the results from the second query do have to exactly match the values you have in the dashboard variable itself. Unfortunately you don't have regexp rewriting for your results the way you do for the main dashboard variable query, so with Prometheus you may need to do some rewriting in the query itself using label_replace(). Also, there's no preview of what value groups (tags) your query generates, or what values are in what groups; you have to go play around with the dashboard to see what you get.

Written on 30 March 2020.
« I set up Python program options and arguments in a separate function
It's worth documenting the obvious (before it stops being obvious) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 30 00:49:43 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.