Wandering Thoughts archives

2020-03-30

It's worth documenting the obvious (before it stops being obvious)

I often feel a little bit silly when I write entries about things like making bar graphs in Grafana or tags for Grafana dashboard variables because when I write them up it's all pretty straightforward and even obvious. This is an illusion. It's all straightforward and obvious to me right now because I've been in the middle of doing this with Grafana, and so I have a lot of context and contextual knowledge. Not only do I know how to do things, I also know what they're called and roughly where to find information about them in Grafana's official documentation. All of this is going to fade away over time, as I stop making and updating our Grafana dashboards.

Writing down these obvious things has two uses. First and foremost, I'll have specific documentation for when I want to do this again in six months or a year or whatever (provided that I can remember that I wrote some entries on this and that I haven't left out crucial context, which I've done in the past). Second, actually writing down my own documentation forces me to understand things more thoroughly and hopefully helps fix them more solidly in my mind, so perhaps I won't even need my entries (or at least not need them so soon).

There's a lot of obvious things and obvious context that we don't document explicitly (in our worklog system or otherwise), which I've noticed before. Some of those obvious things don't really need to be documented because we do them all of the time, but I'm sure there's other things I'm dealing with right now that I won't be in six months. And even for the things that we do all the time, maybe it wouldn't hurt to explicitly write them up once (or every so often, or at least re-check the standard 'how we do X' documentation every so often).

(Also, just because we do something all the time right now doesn't mean we always will. What we do routinely can shift over time, and we won't even necessarily directly notice the shift; it may just slowly be more and more of this and less of that. Or perhaps we'll introduce a system that automates a lot of something we used to do by hand.)

The other side of this, and part of why I'm writing this entry, is that I shouldn't feel silly about documenting the obvious, or at least I shouldn't let that feeling stop me from doing it. There's value in doing it even if the obvious remains obvious to me, and I should keep on doing a certain amount of it.

(Telling myself not to feel things is probably mostly futile. Humans are not rational robots, no matter how much we tell ourselves that we are.)

sysadmin/DocumentTheObvious written at 21:37:13; Add Comment

Notes on Grafana 'value groups' for dashboard variables

Suppose, not hypothetically, that you have some sort of Grafana overview dashboard that can show you multiple hosts at once in some way. In many situations, you're going to want to use a Grafana dashboard variable to let you pick some or all of your hosts. If you're getting the data for what hosts should be in your list from Prometheus, often you'll want to use label_values() to extract the data you want. For example, suppose that you have a label field called 'cshost' that is your local short host name for a host. Then a plausible Grafana query for 'all of our hosts' for a dashboard variable would be:

label_values( node_load1, cshost )

(Pretty much every Unix that the Prometheus host agent runs on will supply a load average, although they may not supply other metrics.)

However, if you have a lot of hosts, this list can be overwhelming and also you may have sub-groupings of hosts, such as all SLURM nodes that you want to make it convenient to narrow down to. To support this, Grafana has a dashboard variable feature called value groups or just 'tags'. Value groups are a bit confusing and aren't as well documented as dashboard variables as a whole.

There are two parts to setting up a value group; you need a query that will give Grafana the names of all of the different groups (aka tags), and then a second query that will tell Grafana which hosts are in a particular group. Suppose that we have a metric to designate which classes a particular host is in:

cslab_class{ cshost="cpunode2", class="comps" }    1
cslab_class{ cshost="cpunode2", class="slurmcpu" } 1
cslab_class{ cshost="cpunode2", class="c6220" }    1

We can use this metric for both value group queries. The first query is to get all the tags, which are all the values of class:

label_values( cslab_class, class )

Note that we don't have to de-duplicate the result; Grafana will do that for us (although we could do it ourselves if we wanted to make a slightly more complex query).

The second query is to get all of values for a particular group (or tag), which is to say the hosts for a specific class. In this query, we have a special Grafana provided $tag variable that refers to the current class, so our query is now for the cshost label for things with that class:

label_values( cslab_class{ class="$tag" }, cshost )

It's entirely okay for this query to return some additional hosts (values) that aren't in our actual dashboard variable; Grafana will quietly ignore them for the most part.

Although you'll often want to use the same metric in both queries, it's not required. Both queries can be arbitrary and don't have to be particularly related to each other. Obviously, the results from the second query do have to exactly match the values you have in the dashboard variable itself. Unfortunately you don't have regexp rewriting for your results the way you do for the main dashboard variable query, so with Prometheus you may need to do some rewriting in the query itself using label_replace(). Also, there's no preview of what value groups (tags) your query generates, or what values are in what groups; you have to go play around with the dashboard to see what you get.

sysadmin/GrafanaVariableGroups written at 00:49:43; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.