I wish Linux exposed a 'OOM kills due to cgroup limits' kernel statistic

September 24, 2023

Under ertain circumstances, Linux will trigger the Out-Of-Memory Killer and kill some process. For some time, there have been two general ways for this to happen, either a global OOM kill because the kernel thinks it's totally out of memory, or a per-cgroup based OOM kill where a cgroup has a memory limit. These days the latter is quite easy to set up through systemd memory limits, especially user memory limits.

The kernel exposes a vmstat statistic for total OOM kills from all causes, as 'oom_kill' in /proc/vmstat; this is probably being surfaced in your local metrics collection agent under some name. Unfortunately, as far as I know the kernel doesn't expose a simple statistic for how many of those OOM kills are global OOM kills instead of cgroup OOM kills. This difference is of quite some interest to people monitoring their systems, because a global OOM kill is probably important while a cgroup OOM kill may be entirely expected.

Each cgroup does have information about OOM kills in its hierarchy (or sometimes itself only, if you used the memory_localevents cgroups v2 mount option, per cgroups(7)). This information is in the 'memory.events' file, but as covered in the cgroups v2 documentation, this file is only present in non-root cgroups, which means that you can't find a system wide version of this information in one place. If you know on a specific system that only one top level cgroup can have OOM kills, you can perhaps monitor that, but otherwise you need something more sophisticated (and in theory you might miss transient top level cgroups, although in practice most are persistent).

The kernel definitely knows this information; the kernel log messages for global OOM kills are distinctly different from the kernel log messages for cgroup OOM kills. So the kernel could expose this information, for example as a new /proc/vmstat field or two; it just doesn't (currently, as of fall 2023).

(Someday we may add a Prometheus cgroups metrics exporter to our host agents in our Prometheus environment and so collect this information, but so far I haven't found a cgroup exporter that I like and that provides the information I want to know.)

Written on 24 September 2023.
« Some questions about Unbound's domain-based rate limits (as of fall 2023)
Splitting our local DNS resolvers apart to serve different audiences »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Sep 24 23:25:25 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.