== Some notes on getting email when your systemd timer services fail Suppose, [[not hypothetically SystemdTimersAndErrors]], that you have some things that are implemented through systemd timers instead of traditional _cron.d_ jobs, and you would like to get email if and when they fail. The lack of this email by default is one of the known issues with turning _cron.d_ entries into systemd timers and people have already come up with ways to do this with systemd tricks, so for full details I will refer you to [[the Arch Wiki section on this https://wiki.archlinux.org/index.php/Systemd/Timers#Caveats]] (brought to my attention by keur's comment on [[my initial entry SystemdTimersAndErrors]]) and [[this serverfault question and its answers https://serverfault.com/questions/876233/how-to-send-an-email-if-a-systemd-service-is-restarted/876249]] ([[via @tvannahl on Twitter https://twitter.com/tvannahl/status/1191989285873967104]]). This entry is my additional notes from having set this up for our Certbot systemd timers. Systemd timers come in two parts; a _.timer_ unit that controls timing and a _.service_ unit that does the work. What we generally really care about is the _.service_ unit failing. To detect this and get email about it, we add an _OnFailure=_ to the timer's _.service_ unit that triggers a specific instance of a template _.service_ that sends email. So if we have _certbot.timer_ and _certbot.service_, we add a .conf file in /etc/systemd/certbot.service.d that contains, say: .pn prewrap on > [Unit] > OnFailure=cslab-status-email@%n.service Due to the use of '_%n_', this is generic; the stanza will be the same for anything we want to trigger email from on failure. The '_%n_' will expand to the full name of the service, eg '_certbot.service_' and be available in the _cslab-status-email@.service_ template unit. My view is that you should always use %n here even if you're only doing this for one service, because it automatically gets the unit name right for you (and why risk errors when you don't have to). In the cslab-status-email@.service unit, the full name of the unit triggering it will be available as '_%i_', as shown in the Arch Wiki's example. Here that will be '_certbot.service_'. (With probably excessive cleverness you could encode the local address to email to into what the template service will get as _%i_ by triggering, eg, cslab-status-email@root-%n.service. We just hard code '_root_' all through.) The Arch Wiki's example script uses '_systemctl status --full _'. Unfortunately this falls into the trap that by default systemd truncates the log output at the most recent ten lines. We found that we definitely wanted more; our script currently uses '_systemctl status --full -n 50 _' (and also contains a warning postscript that it may be incomplete and to see _journalctl_ on the system for full details). Having a large value here is harmless as far as I can tell, because systemd seems to only show the log output from the most recent activation attempt even if there's (much) less than your 50 lines or whatever. (Unfortunately as far as I can see there is no easy way to get just the log output without the framing 'systemctl status' information about the unit, much of which is not particularly useful. We live with this.) As with the Arch Wiki's example script, you definitely want to put the hostname into the email message if you have a fleet. We also embed more information into the Subject and From, and add a _MIME-Version_: > From: $HOSTNAME root > Subject: $1 systemd unit failed on $HOSTNAME > MIME-Version: 1.0 > Content-Transfer-Encoding: 8bit > Content-Type: text/plain; charset=UTF-8 You definitely want to label the email as UTF-8, as '_systemctl status_' puts a UTF-8 '_●_' in its output. The subject could be incorrect (we can't be sure the template unit was triggered through an '_OnFailure=_', even that's how it's supposed to be used), but it's much more useful in the case where everything is working as intended. My bias is towards putting as much context into emails like this, because by the time we get one we'll have forgotten all about the issue and we don't want to be wondering why we got this weird email. The Arch Wiki contains a nice little warning about how systemd may wind up killing child processes that the mail submission program creates ([[as noticed by @lathiat on Twitter https://twitter.com/lathiat/status/1192377748158681094]]). I decided that the easiest way for our script to ward off this was to just sleep for 10 or 15 seconds at the end. Having it exit immediately is not exactly critical and this is the easy (if brute force) way to hopefully work around any problems. Finally, as the Arch Wiki kind of notes, this is not quite the same thing as what cron does. Cron will send you email if your job produces any output, whether or not it fails; this will send you the logged output (if any) if the job fails. If the job succeeds but produces output, that output will go only to the systemd journal and you will get no notification. As far as I know there's no good way to completely duplicate cron's behavior here. (Also, on failure the journal messages you get will include both actual stuff printed by the service and also, I believe, anything it logged to places like syslog; with cron you only get the former. This is probably a useful feature.)