Why logging to syslog is different than logging to standard error
In the previous entry I wrote that
logger is not
a good solution for converting a program that logs to standard error
into a system that logs to syslog. In fact, the issue is more general
logger; logging to standard error is significantly different
from logging to syslog that it is hard to turn the former into the
The issue is that syslog handles a bunch of important things for you that you have to handle yourself in logging to standard error. Syslog automatically attaches a timestamp, the program name, and the process ID (if you ask it to). Since all of these are important pieces of information, when you log to standard error you should be supplying them yourself in order to provide complete logging. If something then simply echoes your full log lines to syslog, you get at least duplicate information; the logs will have timestamps from both syslog and your own program.
(As far as I know there is no way to tell syslog to use your timestamps instead of its own.)
There are a number of solutions, all of them not very great:
- you can have a
loggerequivalent that knows about the format of your messages and so strips off the redundant information before feeding it to syslog. It's clearly specific to your program.
- your program can skip putting timestamps (and perhaps other information)
on log messages, leaving that to another program to do if necessary.
You now intrinsically require a second program in order to get decent
logs, even if you are only logging to a file. Better supply at least
a pointer to such a program in your documentation, and expect sysadmins
to be unhappy about your system no longer being a self-contained thing.
- you can just have your program stutter, putting redundant information into its syslog messages. This makes syslog a second class citizen in your logging world.
The net result of all of this is that it is much easier to start with a program designed to log to syslog and extend it to log to standard error than to take a program designed to log to standard error and bolt on an extension to divert the straight log messages to syslog.
(If you can change the program itself, you can do better. But the kind of people who have their program log to standard error and nothing else generally aren't interested in taking patches to add syslog logging. If you're lucky they might take patches to add an option to turn off timestamps, although now the main program configuration becomes dependent on how you're doing logging.)
Sidebar: the problem abstracted
What this boils down to is that log lines are not simply plain text. Instead they are data (the log message) plus some metadata (the time, the program name, the process ID, etc). Many variant logging systems will want to do custom things with various pieces of the metadata; for example, if you're pushing logs into a SQL database you probably want to put the bits of metadata into separate database fields.
The response to this view is that there is more useful metadata than this in typical log messages, so much metadata (sometimes) that there is no sensible way for the programs generating messages to split them up for you. Since you need some custom parsing no matter what, you might as well bite the bullet and do custom parsing of plain text to extract all of the metadata, even generic metadata like the timestamp.
I don't really like this response, but I think it's probably true in practice; I suspect that everyone who is routing log messages from standard programs into custom monitoring solutions has probably written their own parsing and analysis systems that are specific to the exact programs they're dealing with.
(Well, specific overall. I'm sure people write generic components such as a generic log message parsing program that is driven by pattern matching, but it uses program specific patterns that you write.)