The operational differences between notifications and logs

November 17, 2019

In a comment on my entry on how systemd timer units hide errors, rlaager raised an interesting issue:

The emphasis on emails feels like status quo bias, though. Imagine the situation was reversed: that everything was using systemd timers and then someone wrote cron and people started switching to that. In that case, there is a similar operational change. You'd switch from having a centralized status (e.g. systemctl list-units --failed) and centralized logging (the journal, which also defaults to forwarding to syslog) to crond sending emails. Is that an improvement, a step backwards, neither or both?

My answer is 'both', because in their normal state, emails from cron are fundamentally different from systemd journal entries. Emails from cron are notifications, while log entries of all sorts are, well, logs. A switch from notifications to logs or vice versa is a deep switch with real operational impacts because you get different things from each of them.

Logs give you a history. You can look back through your logs to see what happened when, and with merged logs (or just multiple logs) you can try to correlated this with other things happening at the time. Notifications let you know that something happened (or is happening, but cron only sends email when the cron job finishes), but they don't provide history unless you capture and save each separate email (in order).

(You can create one from the other with additional work, of course. With notifications, you save the notifications in a log, and with logs you have something watch the logs and send you notifications. But you have to go to that additional work, and if you don't do it you're going to miss something.)

On an operational level, switching from one to the other is potentially dangerous because in each case you lose something that you were probably counting on. If you move from a system that gives you notifications (such as cron jobs sending email on failure) to one that gives you logs (such as systemd timer units logging their failures to the journal), you lose the notifications that you're expecting and that you're using to discover problems. If you move from logs to notifications, you lose history and you may get spammed with notifications that you don't actually care about. And of course the most dangerous switches are the ones where you don't realize that you're actually switching (or that the software you use has quietly switched for you, for example by moving from cron jobs to systemd timer units).

(You may also have built your systems differently in the first place. In a log-based world, it's perfectly sensible to have things emit a lot of messages (and then to drive notifications from a subset of them, if you do). If you move to a world where emitting messages triggers notifications, suddenly you will be getting a lot of notifications that you don't want.)

Written on 17 November 2019.
« How we structure our Django web application's configuration settings
It's good to make sure you have notifications of things »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 17 00:43:44 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.