A Linux machine with a strict overcommit limit can still trigger the OOM killer

May 8, 2019

We've been running our general use compute servers with strict overcommit handling for total virtual memory for years, because on compute servers we feel we have to assume that if you ask for a lot of memory, you're going to use it for your compute job. As we discovered last fall, hitting the strict overcommit limit doesn't trigger the OOM killer, which can be inconvenient since instead all sorts of random processes start failing since they can't get any more memory. However, we've recently also discovered that our machines with strict overcommit turned on can still sometimes trigger the OOM killer.

At first this made no sense to me and I thought that something was wrong, but then I realized what is probably going on. You see, strict overcommit really has two parts, although we don't often think about the second one; there's the setting itself, ie having vm.overcommit_memory be 2, and then how much your commit limit is, set by vm.overcommit_ratio as your swap space plus some percentage of RAM. Because we couldn't find an overcommit percentage that worked for us across our disparate fleet of compute servers with very varying amounts of RAM, we set this to '100' some years ago, theoretically allowing our machines with strict overcommit to use all of RAM plus swap space. Of course, this is not actually possible in practice, because the kernel needs some amount of memory to operate itself; how much memory is unpredictable and possibly unknowable.

This gap between what we set and what's actually possible creates three states the system can wind up in. If you ask for as much memory as you can allocate (or in general enough memory), you run the system into the strict overcommit limit; either your request fails immediately or other processes start failing later when their memory allocation requests fail. If you don't ask for too much memory, everything is happy; what you asked for plus what the kernel needs fits into RAM and swap space. But if you ask for just the right large amount of memory, you push the system into a narrow middle ground; you're under the strict overcommit limit so your allocations succeed, but over what the kernel can actually provide, so when processes start trying to use enough memory, the kernel will trigger the OOM killer.

There is probably no good way to avoid this for us, so I suspect we'll just live with the little surprise of the OOM killer triggering every so often and likely terminating a RAM-heavy compute process. I don't think it happens very often, and these days we have a raft of single-user compute servers that avoid the problem.

Sidebar: The problems with attempting to turn down the memory limit

First, we don't have any idea how much memory we'd need to reserve for the kernel to avoid OOM. Being cautious here means that some of the RAM will go idle unless we add a bunch of swap space (and risk death through swap trashing).

Further, not only would the vm.overcommit_ratio setting be machine specific and have to be derived on the fly from the amount of memory, but it's probably too coarse-grained. 1% of RAM on a 256 GB machine is 2.5 GB, although I suppose perhaps the kernel might need that much reserved to avoid OOM. We could switch to using the more recent vm.overcommit_kbytes (cf), but since its value is how much RAM to allow instead of how much RAM to reserve for the kernel, we would definitely have to make it machine specific and derived from how much RAM is visible when the machine boots.

On the whole, living with the possibility of OOM is easier and less troublesome.

Written on 08 May 2019.
« Some weird and dubious syndication feed fetching from SBL-listed IPs
Some general things and views on DNS over HTTPS »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed May 8 00:58:04 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.