Why logging to syslog is different than logging to standard error
April 4, 2011
In the previous entry I wrote that
The issue is that syslog handles a bunch of important things for you that you have to handle yourself in logging to standard error. Syslog automatically attaches a timestamp, the program name, and the process ID (if you ask it to). Since all of these are important pieces of information, when you log to standard error you should be supplying them yourself in order to provide complete logging. If something then simply echoes your full log lines to syslog, you get at least duplicate information; the logs will have timestamps from both syslog and your own program.
(As far as I know there is no way to tell syslog to use your timestamps instead of its own.)
There are a number of solutions, all of them not very great:
The net result of all of this is that it is much easier to start with a program designed to log to syslog and extend it to log to standard error than to take a program designed to log to standard error and bolt on an extension to divert the straight log messages to syslog.
(If you can change the program itself, you can do better. But the kind of people who have their program log to standard error and nothing else generally aren't interested in taking patches to add syslog logging. If you're lucky they might take patches to add an option to turn off timestamps, although now the main program configuration becomes dependent on how you're doing logging.)
Sidebar: the problem abstracted
What this boils down to is that log lines are not simply plain text. Instead they are data (the log message) plus some metadata (the time, the program name, the process ID, etc). Many variant logging systems will want to do custom things with various pieces of the metadata; for example, if you're pushing logs into a SQL database you probably want to put the bits of metadata into separate database fields.
The response to this view is that there is more useful metadata than this in typical log messages, so much metadata (sometimes) that there is no sensible way for the programs generating messages to split them up for you. Since you need some custom parsing no matter what, you might as well bite the bullet and do custom parsing of plain text to extract all of the metadata, even generic metadata like the timestamp.
I don't really like this response, but I think it's probably true in practice; I suspect that everyone who is routing log messages from standard programs into custom monitoring solutions has probably written their own parsing and analysis systems that are specific to the exact programs they're dealing with.
(Well, specific overall. I'm sure people write generic components such as a generic log message parsing program that is driven by pattern matching, but it uses program specific patterns that you write.)
Written on 04 April 2011.
* * *