Modern versions of systemd can cause an unmount storm during shutdowns

May 6, 2020

One of my discoveries about Ubuntu 20.04 is that my test machine can trigger the kernel's out of memory killing during shutdown. My test virtual machine has 4 GB of RAM and 1 GB of swap, but it also has 347 NFS mounts, and after some investigation, what appears to be happening is that in the 20.04 version of systemd (systemd 245 plus whatever changes Ubuntu has made), systemd now seems to try to run umount for all of those filesystems all at once (which also starts a umount.nfs process for each one). On 20.04, this is apparently enough to OOM my test machine.

(My test machine has the same amount of RAM and swap as some of our production machines, although we're not running 20.04 on any of them.)

On the one hand, this is exactly what systemd said it was going to do in general. Systemd will do as much in parallel as possible and these NFS mounts are not nested inside each other, so they can all be unmounted at once. On the other hand, this doesn't scale; there's a certain point where running too many processes at once just thrashes the machine to death even if it doesn't drive it out of memory. And on the third hand, this doesn't happen to us on earlier versions of Ubuntu LTS; either their version of systemd doesn't start as many unmounts at once or their version of umount and umount.nfs requires enough fewer resources that we can get away with it.

Unfortunately, so far I haven't found a way to control this in systemd. There appears to be no way to set limits on how many unmounts systemd will try to do at once (or in general how many units it will try to stop at once, even if that requires running programs). Nor can we readily modify the mount units, because all of our NFS mounts are done through shell scripts by directly calling mount; they don't exist in /etc/fstab or as actual .mount units.

(One workaround would be to set up a new systemd unit that acts before filesystems are unmounted and runs a 'umount -t nfs', because that doesn't try to do all of the unmounts at once. Getting the ordering right may be a little bit tricky.)

Written on 06 May 2020.
« How to set up an Ubuntu 20.04 ISO image to auto-install a server
Linux software RAID resync speed limits are too low for SSDs »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed May 6 21:46:24 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.