2023-08-08
Programs shouldn't commit to fixed and predictable log messages
When I wrote that monitoring your logs is mostly a tarpit, I said that logs are functionally unstructured. I didn't mean this just in the sense that many program logs are either unstructured text messages or custom formats; I meant it in a deeper sense that the log messages and other information is unpredictable. As peculiar as it might be for a system administrator to say this, I think that this unpredictability is a good thing and programs should make at most very limited promises otherwise.
The simple problem with making promises about what your program's logs will contain is that promises create official APIs. Everything you promise about your logs becomes part of your program's functional API, something you've told people that they can confidently use and rely on. If you promise that you'll log certain messages in certain situations and you don't always, you've made this into a bug by definition; if you change what messages you emit in these circumstances in a future version, you've created an API incompatibility.
One of the things we've found out (over and over again) is that the larger a system's API (or set of APIs), the more complicated it is to maintain and evolve it. The fewer APIs you give a system, the better. Making promises about some or many of your log messages creates what is often a rather large new API, one that may significantly constrain how you can evolve your program (or require you to lie in order to preserve the API, emitting messages that become increasingly inaccurate over time). Often the payoff for adding this new API is quite low, making it an especially bad idea.
(Such a large and inobvious API is also error prone when people are changing the program in the future. Log messages look like you can change them or add new ones to cover new cases, but if they're part of your API this is probably not the case.)
If there are important things that you need to reliably surface for observability reasons, then my view is that you should make these into metrics. Having to specifically create and describe metrics is likely to make you more aware that you're creating an API and also limit how many of them you add.
In addition to this being less work for the programmers, let me assure you that as a system administrator, I would much rather have log messages that accurately describe the state and operation of the current version of the system than log messages that match what the last version produced. Of course it would be great to have both, but inevitably the program will change exactly how it operates from version to version.
PS: I'm not against structured logging; I think it's a fine idea and useful for being unambiguous. I'm against making promises about exactly what specific messages and other contents your structured log messages will contain.