Plaintext is not a great format for (system) logs

June 29, 2024

Recently I saw some grumpiness on the Fediverse about systemd's journal not using 'plain text' for storing logs. I have various feelings here, but one of the probably controversial ones is that in general, plain text is not a great format for logs, especially system logs. This is independent of systemd's journal or of anything else, and in fact looking back I can see signs of this in my own experiences long before the systemd journal showed up (for instance, it's part of giving up on syslog priorities).

The core problem is that log messages themselves almost invariably come with additional metadata, often fairly rich metadata, but if you store things in plain text it's difficult to handle that metadata. You have more or less three things you can do with any particular piece of metadata:

  • You can augment the log message with the metadata in some (text) format. For example, the traditional syslog 'plain text' format augments the basic syslog message with the timestamp, the host name, the program, and possibly the process ID. The downside of this is that it makes log messages themselves harder to pick out and process; the more metadata you add, the more the log message itself becomes obscured.

    (One can see this in syslog messages from certain sorts of modern programs, which augment their log messages with a bunch of internal metadata that they put in the syslog log message as a series of 'key=value' text.)

  • You can store the metadata by implication, for example by writing log messages to separate files based on the metadata. For example, syslog is often configured to use metadata (such as the syslog facility and the log level) to control which files a log message is written to. One of the drawbacks of storing metadata by implication is that it separates out log messages, making it harder to get a global picture of what was going on. Another drawback is that it's hard to store very many different pieces of metadata this way.

  • You can discard the metadata. Once again, the traditional syslog log format is an example, because it normally discards the syslog facility and the syslog log level (unless they're stored by implication).

The more metadata you have, the worse this problem is. Perhaps unsurprisingly, modern systems can often attach rich metadata to log messages, and this metadata can be quite useful for searching and monitoring. But if you write your logs out in plain text, either you get clutter and complexity or you lose metadata.

Of course if you have standard formats for attaching metadata to log messages, you can write tools that strip or manipulate this metadata in order to give you (just) the log messages. But the more you do this and rely on it, the less your logs are really plain text instead of 'structured logs stored in a somewhat readable text format'.

(The ultimate version of this is giving up on readability in the raw and writing everything out as JSON. This is technically just text, but it's not usefully plain text.)

Written on 29 June 2024.
« I wish systemd didn't require two units for each socket service
The systemd journal doesn't force you to not have plain text logs »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jun 29 22:32:46 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.