Modern versions of systemd can cause an unmount storm during shutdowns

May 6, 2020

One of my discoveries about Ubuntu 20.04 is that my test machine can trigger the kernel's out of memory killing during shutdown. My test virtual machine has 4 GB of RAM and 1 GB of swap, but it also has 347 NFS mounts, and after some investigation, what appears to be happening is that in the 20.04 version of systemd (systemd 245 plus whatever changes Ubuntu has made), systemd now seems to try to run umount for all of those filesystems all at once (which also starts a umount.nfs process for each one). On 20.04, this is apparently enough to OOM my test machine.

(My test machine has the same amount of RAM and swap as some of our production machines, although we're not running 20.04 on any of them.)

On the one hand, this is exactly what systemd said it was going to do in general. Systemd will do as much in parallel as possible and these NFS mounts are not nested inside each other, so they can all be unmounted at once. On the other hand, this doesn't scale; there's a certain point where running too many processes at once just thrashes the machine to death even if it doesn't drive it out of memory. And on the third hand, this doesn't happen to us on earlier versions of Ubuntu LTS; either their version of systemd doesn't start as many unmounts at once or their version of umount and umount.nfs requires enough fewer resources that we can get away with it.

Unfortunately, so far I haven't found a way to control this in systemd. There appears to be no way to set limits on how many unmounts systemd will try to do at once (or in general how many units it will try to stop at once, even if that requires running programs). Nor can we readily modify the mount units, because all of our NFS mounts are done through shell scripts by directly calling mount; they don't exist in /etc/fstab or as actual .mount units.

(One workaround would be to set up a new systemd unit that acts before filesystems are unmounted and runs a 'umount -t nfs', because that doesn't try to do all of the unmounts at once. Getting the ordering right may be a little bit tricky.)


Comments on this page:

I didn't quite find an exact match to that issue. This is probably related: https://github.com/systemd/systemd/pull/6598

From 216.154.41.138 at 2020-05-07 06:45:33:

While parallelism is generally a good thing, you'd think a prudent 'limit' would be proportional to the number of processors (real or imagined/virtual) that are present on the system.

There's no sense kicking off off 100 processes when one only has 4 (v)CPUs or hyper-threads: yes, modern chips are very fast nowadays, but there's still a finite amount of silicon present to execute things.

By Matt at 2020-05-07 10:45:40:

I'm not familiar with how your shell scripts do the mounting/unmounting, but if you have them as .mount or .service units, you can use "drop-in" config snippets to append config to them.

For example, suppose you have foo.service and bar.service (unit type does not matter) and you cannot (or don't want to) modify the .service config files. Instead you can add a file at `/etc/systemd/system/foo.service.d/ordering.conf` with the contents

[Unit]
Before=bar.service

and that will make the bar.service wait to start until after foo.service is started. Service stopping is done in the reverse order.

By cks at 2020-05-07 11:45:21:

Our shell scripts literally run mount and umount. Systemd materializes the .mount units on its own, and as far as I know mount provides no way to make systemd modify the resulting dynamic .mount units (and anyway no modification to an individual mount unit can prevent this).

Written on 06 May 2020.
« How to set up an Ubuntu 20.04 ISO image to auto-install a server
Linux software RAID resync speed limits are too low for SSDs »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed May 6 21:46:24 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.