Some notes on Grafana Loki's new "structured metadata" (as of 3.0.x)

May 27, 2024

Grafana Loki somewhat bills itself as "Prometheus for logs", and so it's unsurprising that it started with a data model much like Prometheus. Log lines in Loki are stored with some amount of metadata in labels, just as Prometheus metrics values have labels (including the name of the metric, which is sort of a label). Unfortunately Loki made implementation choices that caused this data model to be relatively catastrophic for system logs (either syslog logs or the systemd journal). Unlike Prometheus, Loki stores each set of label values separately and it never compacts its log storage. Your choices are to either throw away a great deal of valuable system log metadata in order to keep label cardinality down, contaminate your log lines with metadata, making them hard to use, or run Loki in a way that causes periodic explosions and is in general very far outside the configurations that Grafana Inc develops Loki for.

Eventually Grafana Inc worked out that this was less than ideal and sort of did something about it, by introducing "structured metadata". The simple way to describe Loki's structured metadata is that it is labels (and label values) that are not used to separate out log line storage. In theory this is just what I've wanted for some time, but in practice as of Loki 3.0.0, structured metadata is undercooked and not something we can use. However, you probably want to use it in a new greenfield development to ingest system logs (via promtail's somewhat underdocumented support for it), although I can't recommend that you use Loki at all, at least in simple configurations.

The first problem is that structured metadata labels are not actual labels as Loki treats them. If you have a structured metadata label 'fred', you cannot write a LogQL query of '{...,fred="value"}'. Instead you must write this as '{....} | fred="value"'. This means that all of your queries care deeply about whether a particular thing is a Loki label or merely a structured metadata label. I feel strongly that your queries should not depend on the details of your database schema, partly because it makes changing your database schema harder. Loki tools are inconsistent about this distinction; for example 'logcli query' will mostly print structured metadata labels as if they were real labels.

Speaking of changing your database schema, that is the large other piece of bad news about structured metadata. If you have an existing Loki environment from before structured metadata, complete with lots of real labels because that's how you had to capture log metadata, there is no obvious way to switch over to using structured metadata for that log metadata. There are some interesting ways to fail to do so, because the current Loki will accept a client submitting 'structured metadata' that the Loki server thinks should be actual labels. If you add some new, higher cardinality structured metadata along side the labels you'd like to convert, I've seen this add that high cardinality structured metadata as actual labels (the result wasn't pretty). If you want to switch, the easiest way is to stop Loki, delete all of your existing log data, and start from scratch with all clients sending all of the log metadata you care about as structured metadata instead of labels.

I haven't tested what happens in a greenfield configuration if most clients send some client-side labels as structured metadata but one client fumbles things and sends them as labels. I would like to think that Loki rejects this, rather than accepts it and silently converts the structured metadata labels from other clients into real labels (possibly high cardinality ones). Unfortunately this isn't a theoretical mistake, because of an implementation choice in the current (3.0.x) version of Promtail. In Promtail, in order to send syslog or systemd journal fields as structured metadata, you must first materialize them as regular labels (via relabeling) and then converted to structured metadata in the structured metadata stage. If you do the first but not the second, your Promtail configuration will send that metadata to Loki as actual labels, possibly to your deep regret.

I was initially hopeful that structured metadata would let us change our Loki configuration to something closer to a mainstream one. Unfortunately, my investigation has ruled this out for now; we would need to change too many existing queries and there are too many uncertainties over whether we would be able to do it without deleting all of our existing log data (and then living in fear of a cardinality explosion due to an outdated or mis-configured client). Maybe in Loki 4.0.

Written on 27 May 2024.
« Flaky alerts are telling you something
ZFS's transactional guarantees from a user perspective »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 27 22:28:40 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.