Wandering Thoughts archives


Some notes on getting email when your systemd timer services fail

Suppose, not hypothetically, that you have some things that are implemented through systemd timers instead of traditional cron.d jobs, and you would like to get email if and when they fail. The lack of this email by default is one of the known issues with turning cron.d entries into systemd timers and people have already come up with ways to do this with systemd tricks, so for full details I will refer you to the Arch Wiki section on this (brought to my attention by keur's comment on my initial entry) and this serverfault question and its answers (via @tvannahl on Twitter). This entry is my additional notes from having set this up for our Certbot systemd timers.

Systemd timers come in two parts; a .timer unit that controls timing and a .service unit that does the work. What we generally really care about is the .service unit failing. To detect this and get email about it, we add an OnFailure= to the timer's .service unit that triggers a specific instance of a template .service that sends email. So if we have certbot.timer and certbot.service, we add a .conf file in /etc/systemd/certbot.service.d that contains, say:


Due to the use of '%n', this is generic; the stanza will be the same for anything we want to trigger email from on failure. The '%n' will expand to the full name of the service, eg 'certbot.service' and be available in the cslab-status-email@.service template unit. My view is that you should always use %n here even if you're only doing this for one service, because it automatically gets the unit name right for you (and why risk errors when you don't have to). In the cslab-status-email@.service unit, the full name of the unit triggering it will be available as '%i', as shown in the Arch Wiki's example. Here that will be 'certbot.service'.

(With probably excessive cleverness you could encode the local address to email to into what the template service will get as %i by triggering, eg, cslab-status-email@root-%n.service. We just hard code 'root' all through.)

The Arch Wiki's example script uses 'systemctl status --full <unit>'. Unfortunately this falls into the trap that by default systemd truncates the log output at the most recent ten lines. We found that we definitely wanted more; our script currently uses 'systemctl status --full -n 50 <unit>' (and also contains a warning postscript that it may be incomplete and to see journalctl on the system for full details). Having a large value here is harmless as far as I can tell, because systemd seems to only show the log output from the most recent activation attempt even if there's (much) less than your 50 lines or whatever.

(Unfortunately as far as I can see there is no easy way to get just the log output without the framing 'systemctl status' information about the unit, much of which is not particularly useful. We live with this.)

As with the Arch Wiki's example script, you definitely want to put the hostname into the email message if you have a fleet. We also embed more information into the Subject and From, and add a MIME-Version:

From: $HOSTNAME root <root@...>
Subject: $1 systemd unit failed on $HOSTNAME
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

You definitely want to label the email as UTF-8, as 'systemctl status' puts a UTF-8 '‚óŹ' in its output. The subject could be incorrect (we can't be sure the template unit was triggered through an 'OnFailure=', even that's how it's supposed to be used), but it's much more useful in the case where everything is working as intended. My bias is towards putting as much context into emails like this, because by the time we get one we'll have forgotten all about the issue and we don't want to be wondering why we got this weird email.

The Arch Wiki contains a nice little warning about how systemd may wind up killing child processes that the mail submission program creates (as noticed by @lathiat on Twitter). I decided that the easiest way for our script to ward off this was to just sleep for 10 or 15 seconds at the end. Having it exit immediately is not exactly critical and this is the easy (if brute force) way to hopefully work around any problems.

Finally, as the Arch Wiki kind of notes, this is not quite the same thing as what cron does. Cron will send you email if your job produces any output, whether or not it fails; this will send you the logged output (if any) if the job fails. If the job succeeds but produces output, that output will go only to the systemd journal and you will get no notification. As far as I know there's no good way to completely duplicate cron's behavior here.

(Also, on failure the journal messages you get will include both actual stuff printed by the service and also, I believe, anything it logged to places like syslog; with cron you only get the former. This is probably a useful feature.)

linux/SystemdTimersMailNotes written at 23:30:42; Add Comment

Realizing that Go constants are always materialized into values

I recently read Global Constant Maps and Slices in Go (via), which starts by noting that Go doesn't let you create const maps or slices and then works around that by having an access function that returns a constant slice (or map):

const rateLimit = 10
func getSupportedNetworks() []string {
    return []string{"facebook", "twitter", "instagram"}

When I read the article, my instinctive reaction was that that's not actually a constant because the caller can always change it (although if you call getSupportedNetworks() again you get a new clean original slice). Then I thought more about real Go constants like rateLimit and realized that they have this behavior too, because any time you use a Go constant, it's materialized into a mutable value.

Obviously if you assign the rateLimit constant to a variable, you can then change the variable later; the same is true of assigning it to a struct field. If you call a function and pass rateLimit as one argument, the function receives it as an argument value and can change it. If a function returns rateLimit, the caller gets back a value and can again change it. This is no different with the slice that getSupportedNetworks() returns.

The difference between using rateLimit and using the return value from getSupportedNetworks is that the latter can be mutated through a second reference without the explicit use of Go pointers:

func main() {
   a := rateLimit
   b := &a
   *b += 10

   c := getSupportedNetworks()
   d := c
   d[1] = "mastodon"

   fmt.Println(a, c)

But this is not a difference between true Go constants and our emulated constant slice, it's a difference in the handling of the types involved. Maps and slices are special this way, but other Go values are not.

(Slices are also mutable at a distance in other ways.)

PS: Go constants can't have their address taken with '&', but they aren't the only sorts of unaddressable values in Go. In theory we could make getSupportedNetworks() return an unaddressable value by making its return value be '[3]string', as we've seen before; in practice you almost certainly don't want to do that for various reasons.

(This seems like an obvious observation now that I've thought about it, but I hadn't really thought about it before reading the article and having my reflexive first reaction.)

programming/GoConstantsAsValues written at 00:13:57; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.