Linux's vm.admin_reserve_kbytes sysctl is both not big enough and not sufficient

December 6, 2021

We enable Linux's strict overcommit on some of our servers (mostly compute servers). Every so often people run big enough programs that they run the machine out of memory, and some of the time when this happens we get various plaintive reports from cron and other things that periodic system processes have failed with out of memory errors. The Linux kernel has a sysctl that's supposed to help with this, vm.admin_reserve_kbytes (documented in vm.txt), but in practice we've found two issues.

The first is that the default value of admin_reserve_kbytes is set for systems not operating in strict overcommit mode, and in any case the value dates from 2013. The kernel's own documentation suggests turning this up to 128 MB for strict overcommit, but I suspect that that's not sufficient for modern programs (a brief check suggests the total virtual size is at least 190 MB or so for sshd, bash, and top on 64-bit x86 Ubuntu 18.04; their combined RSS is over 16 Mbytes). Perhaps 256 Mbytes could be enough in strict overcommit mode. In any case, we need to tune this up and it's hard to know by how much to make sure that cron jobs still keep running while not taking too much memory away from people, especially on machines with modest amounts of memory.

(If we were serious about this, we should look into collecting some sort of memory usage information from cron jobs on at least a test machine. As it is, this is a sufficiently infrequent issue that we don't care enough to do that work.)

The second is that often, no setting of admin_reserve_kbytes will let you log in to a server that's in memory overcommit, because of what I could call the DBus daemon problem. Specifically, during login, parts of the SSH server run as privilege separated, non-privileged users. As deliberately unprivileged UIDs, memory allocations made by these processes are not covered by admin_reserve_kbytes. If ordinary users can't allocate memory, you're almost certainly not going to be able to ssh in even as root. If you could get the SSH daemon to authenticate you, your eventual root sshd and bash processes would be covered by admin_reserve_kbytes, but sadly you need that authentication to happen before you get there.

(Turning off sshd's privilege separation is a cure far worse than the disease.)

The second issue lowers my motivation to try to fix the first problem by finding setting of admin_reserve_kbytes so that our administrative cron jobs reliably keep working. If a machine runs out of memory and stays there, we may not be able to get in to deal with whatever the problem is and things other than cron jobs may run into issues (we've seen the DBus daemon have problems in the past). Plus, our machines almost never run out of memory to the extent that we get cron email complaints about it.

PS: Someday our Ubuntu LTS machines may run systemd-oomd, which will undoubtedly need its own configuration and tuning. This might even show up in the future Ubuntu 22.04 LTS, which is not all that far away.

Written on 06 December 2021.
« A bit on what Unix system pre-boot environments used to look like
NVMe drives and the case of opaque bandwidth limits »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 6 22:03:44 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.