Roughly when the Linux Out-Of-Memory killer triggers (as of mid-2019)

August 11, 2019

For reasons beyond the scope of this entry, I've recently become interested in understanding more about when the Linux OOM killer does and doesn't trigger, and why. Detailed documentation on this is somewhat sparse and and some of it is outdated (eg). I can't add detailed documentation, because doing that requires fully understanding kernel memory management code, but I can at least write down some broad overviews for my own use.

(All of this is as of the current Linux kernel git tree, because that's what I have on hand. The specific details change over time, although the code seems broadly unchanged between git tip and the Ubuntu 18.04 LTS kernel, which claims to be some version of 4.15.)

These days there are two sort of different OOM killers in the kernel; there is the global OOM killer and then there is cgroup-based OOM through the cgroup memory controller, either cgroup v1 or cgroup v2. I'm primarily interested in when the global OOM killer triggers, partly because the cgroup OOM killer is relatively more predictable.

The simple answer is that the global OOM killer triggers when the kernel has problems allocating pages of physical RAM. When the kernel is attempting to allocate pages of RAM (for whatever use, either for kernel usage or for processes that need pages) and initially fails, it will try various ways to reclaim and compact memory. If this works or at least makes some progress, the kernel keeps retrying the allocation (as far as I can tell from the code); if they fail to free up pages or make progress, it triggers the OOM killer under many (but not all) circumstances.

(The OOM killer is not triggered if, for instance, the kernel is asking for a sufficiently large number of contiguous pages, cf. At the moment, the OOM killer is still only invoked for contiguous allocations of 32 Kb or less (order 3), which is the same as it was back in 2012; in fact, 'git blame' says this dates from 2007.)

As far as I can tell, there's nothing that stops the OOM killer being triggered repeatedly for the same attempted page allocation. If the OOM killer says it made progress, the page allocation is retried, but there's probably no guarantee that you can get memory now (any freed memory might have been grabbed by another request, for example). Similarly, as far as I can tell the OOM killer can be invoked repeatedly in close succession; there doesn't seem to be any 'must be X time between OOM kills' limits in the current code. The trigger is simply that the kernel needs pages of RAM and it can't seem to get them any other way.

(Of course you hope that triggering the OOM killer once frees up a bunch of pages of RAM, since that's what it's there for.)

The global OOM killer is not particularly triggered when processes simply allocate (virtual) memory, because this doesn't necessarily allocate physical pages of RAM. Decisions about whether or not to grant such memory allocation requests are not necessarily independent of the state of the machine's physical RAM, but I'm pretty sure you can trigger the OOM killer without having reached strict overcommit limits and you can definitely have memory allocation requests fail without triggering the OOM killer.

In the current Linux tree, you can see this sausage being made in mm/page_alloc.c's __alloc_pages_slowpath. mm/oom_kill.c is concerned with actually killing processes.

PS: I will avoid speculating about situations where this approach might fail to trigger the OOM killer when it really should, but depending on how reclaim is implemented, there seem to be some relatively obvious possibilities.

Sidebar: When I think cgroups OOM is triggered

If you're using the memory cgroup controller (v1 or v2) and you set a maximum memory limit, this is (normally) a limit on how much RAM the cgroup can use. As the cgroup's RAM usage grows towards this limit, the kernel memory system will attempt to evict the cgroup's pages from RAM in various ways (such as swapping them out). If it fails to evict enough pages fast enough and the cgroup runs into its hard limit on RAM usage, the kernel triggers the OOM killer against the cgroup.

This particular sausage seems to be made in mm/memcontrol.c. You want to look for the call to out_of_memory and work backward. I believe that all of this is triggered by any occasion when a page of RAM is charged to a cgroup, which includes more than just the RAM directly used by processes.

(In common configurations, I believe that a cgroup with such a hard memory limit can consume all of your swap space before it triggers the OOM killer.)

If you want to know whether a OOM kill was global or from a cgroup limit, this is in the kernel message. For a cgroup OOM kill, the kernel message will look like this:

Memory cgroup out of memory: Kill process ... score <num> or sacrifice child

For a global out of memory, the kernel message will look like this:

Out of memory: Kill process ... score <num> or sacrifice child

I sort of wish the global version specifically mentioned that it was a non-cgroup OOM kill, but you can see how it wound up this way; when cgroup OOM kills were introduced, I suspected that no one wanted to change the existing global OOM kill message for various reasons.

Comments on this page:

By Arnaud Gomes at 2019-08-12 02:06:40:

A cgroup with a hard memory limit can indeed fill the swap unless you set it up not to; you can also limit the total RAM+swap size (by tuning memory.memsw.limit_in_bytes, as opposed to memory.limit_in_bytes).

Written on 11 August 2019.
« One core problem with DNSSEC
Linux can run out of memory without triggering the Out-Of-Memory killer »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Aug 11 23:13:12 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.