Linux can run out of memory without triggering the Out-Of-Memory killer
If you have a machine with strict overcommit turned on, your memory allocation requests will start to fail once enough virtual address space has been committed, because that's what you told the kernel to do. Hitting your strict overcommit limit doesn't trigger the Out-Of-Memory killer, because the two care about different things; strict memory overcommit cares about committed address space, while the global OOM killer cares about physical RAM. Hitting the commit limit may kill programs anyway, because many programs die if their allocations fail. Also, under the right situations, you can trigger the OOM killer on a machine set to strict overcommit.
Until recently, if you had asked me about how Linux behaved in the default 'heuristic overcommit' mode, I would have told you that ordinary memory allocations would never fail in it; instead, if you ran out of memory (really RAM), the OOM killer would trigger. We've recently found out that this is not the case, at least in the Ubuntu 18.04 LTS '4.15.0' kernel. Under (un)suitable loads, various of our systems can run out of memory without triggering the OOM killer and persist in this state for some time. When it happens, the symptoms are basically the same as what happens under strict overcommit; all sorts of things can't fork, can't map shared libraries, and so on. Sometimes the OOM killer is eventually invoked, other times the situation resolves itself, and every so often we have to reboot a machine to recover it.
I would like to be able to tell you why and how this happens, but
I can't. Based on the kernel code involved, the memory allocations
aren't being refused because of heuristic overcommit, which still
has its very liberal limits on how much memory you can ask for (see
__vm_enough_memory
in mm/util.c).
Instead something else is causing forks, mmap()
s of shared
libraries, and so on to fail with 'out of memory' errno
values,
and whatever that something is it doesn't trigger the OOM killer
during the failure and doesn't cause the kernel to log any other
messages, such as the ones you can see for page allocation failures.
(Well, the messages you see for certain page allocations. Page
allocations can be flagged as __GFP_NOWARN
, which suppresses
these.)
PS: Unlike the first time we saw this, the recent cases have committed address space rising along with active anonymous pages, and the kernel's available memory dropping in sync and hitting zero at about the time we see failures start.
|
|