A Linux machine with a strict overcommit limit can still trigger the OOM killer
We've been running our general use compute servers with strict overcommit handling for total virtual memory for years, because on compute servers we feel we have to assume that if you ask for a lot of memory, you're going to use it for your compute job. As we discovered last fall, hitting the strict overcommit limit doesn't trigger the OOM killer, which can be inconvenient since instead all sorts of random processes start failing since they can't get any more memory. However, we've recently also discovered that our machines with strict overcommit turned on can still sometimes trigger the OOM killer.
At first this made no sense to me and I thought that something was
wrong, but then I realized what is probably going on. You see,
strict overcommit really has two parts, although we don't often
think about the second one; there's the setting itself, ie having
vm.overcommit_memory be 2, and then how much your commit limit
is, set by
vm.overcommit_ratio as your swap space plus some
percentage of RAM. Because we couldn't find an overcommit percentage
that worked for us across our disparate fleet of compute servers
with very varying amounts of RAM, we set this to '100' some years
ago, theoretically allowing our machines with strict overcommit to
use all of RAM plus swap space. Of course, this is not actually
possible in practice, because the kernel needs some amount of memory
to operate itself; how much memory is unpredictable and possibly
This gap between what we set and what's actually possible creates three states the system can wind up in. If you ask for as much memory as you can allocate (or in general enough memory), you run the system into the strict overcommit limit; either your request fails immediately or other processes start failing later when their memory allocation requests fail. If you don't ask for too much memory, everything is happy; what you asked for plus what the kernel needs fits into RAM and swap space. But if you ask for just the right large amount of memory, you push the system into a narrow middle ground; you're under the strict overcommit limit so your allocations succeed, but over what the kernel can actually provide, so when processes start trying to use enough memory, the kernel will trigger the OOM killer.
There is probably no good way to avoid this for us, so I suspect we'll just live with the little surprise of the OOM killer triggering every so often and likely terminating a RAM-heavy compute process. I don't think it happens very often, and these days we have a raft of single-user compute servers that avoid the problem.
Sidebar: The problems with attempting to turn down the memory limit
First, we don't have any idea how much memory we'd need to reserve for the kernel to avoid OOM. Being cautious here means that some of the RAM will go idle unless we add a bunch of swap space (and risk death through swap trashing).
Further, not only would the
vm.overcommit_ratio setting be
machine specific and have to be derived on the fly from the amount
of memory, but it's probably too coarse-grained. 1% of RAM on a 256
GB machine is 2.5 GB, although I suppose perhaps the kernel might
need that much reserved to avoid OOM.
We could switch to using the more recent
vm.overcommit_kbytes (cf), but since
its value is how much RAM to allow instead of how much RAM to reserve
for the kernel, we would definitely have to make it machine specific
and derived from how much RAM is visible when the machine boots.
On the whole, living with the possibility of OOM is easier and less troublesome.