Plaintext is not a great format for (system) logs
Recently I saw some grumpiness on the Fediverse about systemd's journal not using 'plain text' for storing logs. I have various feelings here, but one of the probably controversial ones is that in general, plain text is not a great format for logs, especially system logs. This is independent of systemd's journal or of anything else, and in fact looking back I can see signs of this in my own experiences long before the systemd journal showed up (for instance, it's part of giving up on syslog priorities).
The core problem is that log messages themselves almost invariably come with additional metadata, often fairly rich metadata, but if you store things in plain text it's difficult to handle that metadata. You have more or less three things you can do with any particular piece of metadata:
- You can augment the log message with the metadata in some (text)
format. For example, the traditional syslog 'plain text' format
augments the basic syslog message with the timestamp, the host
name, the program, and possibly the process ID. The downside of
this is that it makes log messages themselves harder to pick out
and process; the more metadata you add, the more the log message
itself becomes obscured.
(One can see this in syslog messages from certain sorts of modern programs, which augment their log messages with a bunch of internal metadata that they put in the syslog log message as a series of 'key=value' text.)
- You can store the metadata by implication, for example by writing
log messages to separate files based on the metadata. For example,
syslog is often configured to use metadata (such as the syslog
facility and the log level) to control which files a log message
is written to. One of the drawbacks of storing metadata by
implication is that it separates out log messages, making it
harder to get a global picture of what was going on. Another
drawback is that it's hard to store very many different pieces
of metadata this way.
- You can discard the metadata. Once again, the traditional syslog log format is an example, because it normally discards the syslog facility and the syslog log level (unless they're stored by implication).
The more metadata you have, the worse this problem is. Perhaps unsurprisingly, modern systems can often attach rich metadata to log messages, and this metadata can be quite useful for searching and monitoring. But if you write your logs out in plain text, either you get clutter and complexity or you lose metadata.
Of course if you have standard formats for attaching metadata to log messages, you can write tools that strip or manipulate this metadata in order to give you (just) the log messages. But the more you do this and rely on it, the less your logs are really plain text instead of 'structured logs stored in a somewhat readable text format'.
(The ultimate version of this is giving up on readability in the raw and writing everything out as JSON. This is technically just text, but it's not usefully plain text.)
|
|