I'm basically giving up on syslog priorities

December 1, 2017

I was recently writing a program where I was logging things to syslog, because that's our default way of collecting and handling logs. For reasons beyond the scope of this entry I was writing my program in Go, and unfortunately Go's standard syslog package makes it relatively awkward to deal with varying syslog priorities. My first pass at the program dutifully slogged through the messy hoops to send various different messages with different priorities, going from info for routine events, to err for reporting significant but expected issues, and ending up at alert for things like 'a configuration file is broken and I can't do anything'. After staring at the resulting code for a while with increasingly unhappy feelings, I ripped all of it out in favour of a much simpler use of basic Go logging that syslogged everything at priority info.

At a theoretical level, this is clearly morally wrong. Syslog priorities have meanings and the various sorts of messages my program can generate are definitely of different importance to us; for example, we care far more about 'a configuration file is broken' than 'I did my thing with client machine <X>'. At a practical level, though, syslog priorities have become irrelevant and thus unimportant. For a start, we make almost no attempt to have our central syslog server split messages up based on their priority. The most we ever look at is different syslog facilities, and that's only because it helps reduce the amount of messages to sift through. We have one file that just gets everything (we call it allmessages), and often we just go look or search there for whatever we're interested in.

In my view there are two pragmatic reasons we've wound up in this situation. First, the priority that a particular message of interest is logged at is something we'd have to actively remember in order for it to be of use. Carefully separating out the priorities into different files only actually helps us if we can remember that we want to look at, say, all.alert for important messages from our programs. In practice we can barely remember which syslog facility most things use, which is one reason we often just look at allmessages.

More importantly, we're mostly looking at syslog messages from software we didn't write and it turns out that what syslog priorities get used are both unpredictable and fairly random. Some programs dump things we want to know all the way down at priority debug; others spray unimportant issues (or what we consider unimportant) over nominally high priorities like err or even crit. This effectively contaminates most syslog priorities with a mixture of messages we care about and messages we don't, and also makes it very hard to predict what priority we should look at. We're basically down to trying to remember that program <X> probably logs the things we care about at priority <Y>. There are a bunch of program <X>s and in practice it's not worth trying to remember how they all behave (and they can change their minds from version to version, and we may have both versions on our servers on different OSes).

(There is a similar but somewhat smaller issue with syslog facilities, which is one reason we use allmessages so much. A good illustration of this is trying to predict or remember which messages from which programs will wind up in facility auth and which wind up in authpriv.)

This whole muddle of syslog priority usage is unfortunate but probably inevitable. The end result is that syslog priorities have become relatively meaningless and so there's no real harm in me giving up on them and logging everything at one level. It's much more important to capture useful information that we'll want for troubleshooting than to worry about what exact priority it should be recorded at.

(There's also an argument that fine-grained priority levels are the wrong approach anyway and you have maybe three or four real priority levels at most. Some people would say even less, but I'm a sysadmin and biased.)

Written on 01 December 2017.
« We're broadly switching to synchronizing time with systemd's timesyncd
My new Linux office workstation for fall 2017 »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Dec 1 23:23:03 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.