I wish Linux exposed a 'OOM kills due to cgroup limits' kernel statistic
Under ertain circumstances, Linux will trigger the Out-Of-Memory Killer and kill some process. For some time, there have been two general ways for this to happen, either a global OOM kill because the kernel thinks it's totally out of memory, or a per-cgroup based OOM kill where a cgroup has a memory limit. These days the latter is quite easy to set up through systemd memory limits, especially user memory limits.
The kernel exposes a vmstat statistic for total OOM kills from all
causes, as '
/proc/vmstat; this is probably being
surfaced in your local metrics collection agent under some name.
Unfortunately, as far as I know the kernel doesn't expose a simple
statistic for how many of those OOM kills are global OOM kills
instead of cgroup OOM kills. This difference is of quite some
interest to people monitoring their systems, because a global
OOM kill is probably important while a cgroup OOM kill may be
Each cgroup does have information about OOM kills in its hierarchy
(or sometimes itself only, if you used the memory_localevents
cgroups v2 mount option, per cgroups(7)). This
information is in the '
memory.events' file, but as covered in
the cgroups v2 documentation, this file is
only present in non-root cgroups, which means that you can't find
a system wide version of this information in one place. If you know
on a specific system that only one top level cgroup can have OOM
kills, you can perhaps monitor that, but otherwise you need something
more sophisticated (and in theory you might miss transient top level
cgroups, although in practice most are persistent).
The kernel definitely knows this information; the kernel log messages for global OOM kills are distinctly different from the kernel log messages for cgroup OOM kills. So the kernel could expose this information, for example as a new /proc/vmstat field or two; it just doesn't (currently, as of fall 2023).
(Someday we may add a Prometheus cgroups metrics exporter to our host agents in our Prometheus environment and so collect this information, but so far I haven't found a cgroup exporter that I like and that provides the information I want to know.)