How /proc/slabinfo
is not quite telling you what it looks like
The Linux kernel does a lot (although not all) of its interesting
internal memory allocations through a slab allocator. For quite a while
it's exposed per-type details of this process in /proc/slabinfo
;
this is very handy to get an idea of just what in your kernel is
using up a bunch of memory. Today I was exploring this because I
wanted to look into ZFS on Linux's memory usage
and wound up finding out that on modern Linuxes it's a little bit
misleading.
(By 'found out' I mean that DeHackEd on the #zfsonlinux IRC channel explained it to me.)
Specifically, on modern Linux the names shown in slabinfo
are
basically a hint because the current slab allocator in the kernel merges multiple slab types together
if they are sufficiently similar. If five different subsystems all
want to allocate (different) 128-byte objects with no special
properties, they don't each get separate slab types with separate
slabinfo
entries; instead they are all merged into one slab type
and thus one slabinfo
entry. That slabinfo
entry normally shows
the name of one of them, probably the first to be set up, with no
direct hint that it also includes the usage of all the others.
(The others don't appear in slabinfo
at all.)
Most of the time this is a perfectly good optimization that cuts
down on the number of slab types and enables better memory sharing
and reduced fragmentation. But it does mean that you can't tell the
memory used by, say, btree_node
apart from ip_mrt_cache
(on my machine, both are one of a lot of slab types that are actually
all mapped to the generic 128-byte object). It can also leave you
wondering where your slab types actually went, if you're inspecting
code that creates a certain slab type but you can't find it in
slabinfo
(which is what happened to me).
The easiest way to see this mapping is to look at /sys/kernel/slab
;
all those symlinks are slab types that may be the same thing. You
can decode what is what by hand, but if you're going to do this
regularly you should get a copy of tools/vm/slabinfo.c
from the
kernel source and compile it; see the kernel SLUB documentation for details.
You want 'slabinfo -a
' to report the mappings.
(Sadly slabinfo
is underdocumented. I wish it had a manpage or
at least a README.)
If you need to track the memory usage of specific slab types, perhaps
because you really want to know the memory usage of one subsystem,
the easiest way is apparently to boot with the slub_nomerge
kernel command line argument. Per the
the kernel parameter documentation
this turns off all slab merging, which may result in you having a
lot more slabs than usual.
(On my workstation, slab merging condenses 110 different slabs into 14 actual slabs. On a random server, 170 slabs turn into 35 and a bunch of the pre-merger slabs are probably completely unused.)
Sidebar: disabling this merging in kernel code
The SLUB allocator does not directly expose a way of disabling this
merging when you call kmem_cache_create()
in that there's no
'do not merge, really' flag to the call. However, it turns out that
supplying at least one of a number of SLUB debugging flags will
disable this merging and on a kernel built without
CONFIG_DEBUG_KMEMLEAK
using SLAB_NOLEAKTRACE
appears
to have absolutely no other effects from what I can tell.
Both Fedora 20 and Ubuntu 14.04 build their kernels without this
option.
(I believe that most Linux distributions put a copy of the kernel
build config in /boot
when they install kernels.)
This may be handy if you have some additional kernel modules that you want to be able to track memory use for specifically even though a number of their slabs would normally get merged away, and you're compiling from source and willing to make some little modifications to it.
You can see the full set of flags that force never merging in the
#define
for SLUB_NEVER_MERGE
in mm/slub.c
. On a quick look,
none of the others are either harmless or always defined as a
non-zero value. It's possible that SLAB_DEBUG_FREE
also does
nothing these days; if used it will make your slabs only mergeable
with other slabs that also specify it (which no slabs in the main
kernel source do). That would cause slabs from your code to potentially
be merged together but they wouldn't merge with anyone else's slabs,
so at least you could track your subsystem's memory usage.
Disclaimer: these ideas have been at most compile-tested, not run live.
|
|