We won't be sending systemd logs to Grafana Loki in JSON format

July 19, 2022

In yesterday's entry on a Grafana Loki cardinality issue with the metadata you can harvest from systemd log messages, I said that I thought we'd wind up dealing with the issue by sending systemd journal messages to Loki in the journal's JSON format. This JSON format preserves all of the available fields and Loki can process it to extract them when you access the logs. Having actually tried doing this, I've now given up on the idea.

In theory, sending JSON to Loki is the correct approach. JSON is the standard modern way to send a collection of information together, such as a log message and the metadata associated with it, and Loki has tools for parsing JSON, extracting the fields, and so on. In a perfect world we'd ship JSON and everything would work nicely.

In practice, trying to use JSON formatted systemd logs today with Loki and Grafana gave me a succession of papercuts and irritations. I'm currently mostly using Grafana's Explore feature to poke through Loki logs, and this had various issues. For example, Explore gets unhappy if there are errors in the log processing, such as if you have a mixture of JSON and non-JSON log lines so that JSON decoding fails some of the time. Explore has a handy 'show context' feature that shows you related log lines around the selected log, but when you have JSON logs these are in JSON because that's the native log message. Naturally, JSON blobs are harder to skim (and take up more screen space) than the actual log messages.

It would be vaguely nice to have the extra systemd metadata that the JSON format gives us, mostly because it's generally good to save metadata instead of throwing it away. But I would clearly be paying a significant price in usability and little irritations and Grafana features not working if we kept using JSON. The tradeoff right now isn't worth it.

(This generalizes to more than systemd log messages. If I have a choice between sending Loki a JSON or a non-JSON format, I'm going to pick the non-JSON format unless I'd lose data that's clearly important to us.)

This is ultimately a user interface problem, or in another view an information presentation problem. Grafana and Loki merely lack a way to tell one or the other that given a JSON blob, most of it is metadata and one particular field is the log message and should be treated as such.

(You can reformat the log line, as covered yesterday, but this doesn't persuade Grafana's Explore that all is fine, tell it to show just the specific JSON field in 'show context', or fix various other papercuts.)

An unfortunate aspect of this is that because the format of existing saved data has an immense amount of gravity, the log format decisions we make now will likely be what we stick with even if Grafana later improves the handling of this sort of JSON format data, or if we move almost entirely to interacting with Loki through dashboards with canned queries (which I think can more easily handle these issues than the ad-hoc Explore environment). And once we build dashboards that expect non-JSON systemd log messages, the gravity will increase.

Written on 19 July 2022.
« Grafana Loki and what can go wrong with label cardinality
A brute force solution to nested access permissions in Apache »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Jul 19 22:17:55 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.