What sysadmins want out of logging means that it can't be too simple
Dave Cheney recently wrote Let's talk about logging about (Go) logging packages, where he advocates, well, I'm going to quote him directly:
I believe that there are only two things you should log:
- Things that developers care about when they are developing or debugging software.
- Things that users care about when using your software.
Obviously these are debug and info levels, respectively.
log.Infoshould simply write that line to the log output. There should not be an option to turn it off as the user should only be told things which are useful for them. [...]
My reaction is that this is too simple for real use. Ignoring things like (web) activity logs (which Dave Cheney agrees are a different case), there are clear divisions between what sysadmins need at different times and in different situations.
First, let's agree that programs should always be able to log their basic actions. If you're a web server, this is HTTP requests; if you're a mail server, this is email traffic; and so on. This tells sysadmins whether or not the system is doing anything, and if it's doing something what it's doing and how fast. Sysadmins will use this to do monitoring, to check if something happened (such as an email arriving or a request being processed), and so on.
Systems not infrequently encounter internal issues that are not fatal errors. They may experience timeouts, request errors, and so on. If we say that errors are fatal, these are all 'warnings' (even if some terminate the processing of the current whatever that the system is handling). They mark odd things that should not normally happen. Sysadmins like to have a record of these for obvious reasons.
Finally, when sysadmins are working to diagnose problems with services we want to be able to get detailed activity traces of exactly how the system processed requests. What did it look at? What did it find or not find as it stepped through things? Here we're looking for a description of why the system is acting as it is. This level of information is too voluminous to be logged routinely, and often it needs to be segmented up so that we can look only at certain aspects (because otherwise we'll drown in probably irrelevant information).
It's tempting to say that this level of information is the same as developer debug information, but it's my view that it's not. Developer debug information is internally focused and aimed at people who know the code and are making code changes. Sysadmin activity traces are externally focused and aimed at people who do not know the code and are not changing it. As a sysadmin, I don't care about internal state in the code; I'm going to assume there's no code bug and instead that I have either a misconfiguration or a malfunction somewhere in the overall system environment. I want to find that.
You can in theory run all of this through a simple
interface. But if you do so there are two problems. First, you need
to create internal standards in your program for formatting messages
so that sysadmins can tell the different sorts of messages apart
from each other. Second, you are spewing massive amounts of information
out all the time (since you're always dumping all activity traces),
which is not very friendly. My view is that a good logging package
should be able to do this for you. A too-simple logging package
throws both program authors and sysadmins to the wolves of ad-hoc
logging and log filtering.
This is why real programs grow features to control what gets logged and to log different sorts of things in different places. Apache does not have separate request logs and error logs for arbitrary reasons, for example; real people wanted that separation because they find it quite useful.