Grafana Loki doesn't duplicate a central syslog server (or vice versa)

September 14, 2022

We've had a central syslog server for a long time, and recently we've set up a Grafana Loki server as well, where we're sending pretty much a duplicate of the logs that go to the syslog server. After using Loki for a while, I've come to the conclusion that the two serve different purposes and neither makes the other unnecessary.

(Grafana Loki is concisely called "Prometheus for logs", or to quote its website it's 'a log aggregation system designed to store and query logs from all your applications and infrastructure'. You can see how this might sound like it duplicates a central syslog server.)

Our central syslog server is a central source of accessible, text based truth. It requires almost no infrastructure to be working on any machine in order to get logs to it and accept logs, and while the logs are unstructured and generally have little metadata, they are in text and so are extremely accessible. Anything can read, search, and process text, and it's straightforward to back up and otherwise deal with. It's also quite compact when compressed. We have logs from most of our systems going back to late 2017, and /var/log on the central syslog server takes up less than 150 GB.

Grafana Loki stores logs in its own format, which is generally less space efficient than a few giant compressed text files, and requires a much more complicated set of programs and systems to be running in order to accept logs. While Loki can store more metadata more readily than syslog text files can, it's fragile if you feed it too much metadata. Loki also has no track record of longevity or of storage durability, and is a project almost entirely developed by a single VC funded corporation (cf).

However, Loki has its own features. It does let you readily capture more metadata than syslog does, it lets you integrate multiple log sources together (since the Promtail agent can read from log files as well as the systemd journal), it integrates well with Grafana dashboards so that information from your logs is more accessible (letting you see things that were always there but previously too tedious to look at), and in practice it's faster to query for small scale things (really long time ranges or big results are probably still best done with your text syslogs). Once you learn enough LogQL, it's also possibly easier to make somewhat complex log queries.

(Having more metadata about even syslog level logs is useful both for narrowing searches down and discovering additional things about log messages, like what systemd unit or executable is associated with them. It actually makes syslog priorities somewhat useful, in contrast to my usual feelings about them. Effectively Loki throws all messages in one place and lets you sort them out later.)

So what I've come to feel is that our central syslog server is there to be our ultimate source of truth, while Grafana Loki is there to make logs accessible. We may not go to the central syslog server very often, but we definitely want it to be there (and we actually do drive some things from its logs). Meanwhile, Loki's accessibility through Grafana makes it more likely to be used.

(In theory we could get the same usefulness from our central syslog server if there was a sidecar system that indexed some basic information from the logs (and saved the file names and offsets of where to read the raw text) and then provided an API to it that Grafana could use. In practice, as far as I know there's no such thing; you can feed your logs to various things to index them, but just like Loki the things want to keep their own copy of the logs in their own format.)

Written on 14 September 2022.
« Link: USB, Thunderbolt, Displayport & docks
The C free() API gives libraries and functions useful freedom »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Sep 14 22:12:40 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.