The temptation of smartctl's JSON output format given NVMe SSDs
Over on the Fediverse, I said something:
I have a real temptation to combine smartctl's (new) JSON output with jq to generate Prometheus metrics from SMART data (instead of my current pile of awk of non-JSON smartctl output). On the other hand, using jq for this feels like a Turing tarpit; it feels like the right answer is having a Python/etc program ingest the JSON and do all the reformatting and gathering in a real programming language that I'll be able to read and follow in a few months.
We believe in putting data from SMART into our metrics system so that we have it captured
and can do various things with it, now and in the future. Today,
this is done by processing the normal output of '
smartctl -i' and
smartctl -A' for our SATA and SAS drives using a mix of awk and
other Unix programs in a shell script. The fly in the ointment on
a few machines today (and more machines in the future) is NVMe
SSDs, because NVMe SSDs have health information but not SMART
attributes, so while '
smartctl -A' works
on them it produces output in a completely different format that
my script has no idea how to deal with.
There are three attractions of using smartctl's new-ish JSON output
format with some post-processing step. The first is that I can run
smartctl only once for each drive, because the JSON output format
makes it straightforward to handle the output of '
all at once. The second is that I could probably condense a lot of
the extraction of various fields and the chopping up of various
bits into a single program that runs once, instead of a bunch of
Unix programs that run repeatedly.
The third and biggest is that I could unify processing of SMART
attributes and NVMe health information and handle it all in the
same processing of the JSON output. The processing would simply
look for SMART attributes and NVMe health information in the JSON
and output whatever it found, rather than having to tell the two
apart from how the input was formatted.
(In other words, the JSON output comes conveniently pre-labeled.)
Using smartctl's JSON output format doesn't solve all of the problems presented by NVMe SSDs, because the health information presented by NVMe SSDs doesn't map exactly on to SMART attributes. If I wanted to be honest, I would generate different Prometheus metrics for them that didn't pretend to have, for example, a SMART attribute ID number. But if I did that, I would make it harder to do metrics queries like 'show us the most heavily written to drives' across all of our drives regardless of their type.
(Or, more likely, 'show us all of the drive temperatures', since how things like power-on hours and write volume is represented in SMART varies a lot between different drives).
The usual tool for processing JSON in shell scripts is jq. In theory jq might be able to do all of the selection and processing of smartctl's JSON output that's needed for this. In practice, I suspect I will be much happier doing this in Python, because the logic of what is extracted and reported (and how it's mangled) will be much clearer in a programming language than in jq's terse filtering and formatting mini-language.