Wandering Thoughts archives

2014-10-09

How /proc/slabinfo is not quite telling you what it looks like

The Linux kernel does a lot (although not all) of its interesting internal memory allocations through a slab allocator. For quite a while it's exposed per-type details of this process in /proc/slabinfo; this is very handy to get an idea of just what in your kernel is using up a bunch of memory. Today I was exploring this because I wanted to look into ZFS on Linux's memory usage and wound up finding out that on modern Linuxes it's a little bit misleading.

(By 'found out' I mean that DeHackEd on the #zfsonlinux IRC channel explained it to me.)

Specifically, on modern Linux the names shown in slabinfo are basically a hint because the current slab allocator in the kernel merges multiple slab types together if they are sufficiently similar. If five different subsystems all want to allocate (different) 128-byte objects with no special properties, they don't each get separate slab types with separate slabinfo entries; instead they are all merged into one slab type and thus one slabinfo entry. That slabinfo entry normally shows the name of one of them, probably the first to be set up, with no direct hint that it also includes the usage of all the others.

(The others don't appear in slabinfo at all.)

Most of the time this is a perfectly good optimization that cuts down on the number of slab types and enables better memory sharing and reduced fragmentation. But it does mean that you can't tell the memory used by, say, btree_node apart from ip_mrt_cache (on my machine, both are one of a lot of slab types that are actually all mapped to the generic 128-byte object). It can also leave you wondering where your slab types actually went, if you're inspecting code that creates a certain slab type but you can't find it in slabinfo (which is what happened to me).

The easiest way to see this mapping is to look at /sys/kernel/slab; all those symlinks are slab types that may be the same thing. You can decode what is what by hand, but if you're going to do this regularly you should get a copy of tools/vm/slabinfo.c from the kernel source and compile it; see the kernel SLUB documentation for details. You want 'slabinfo -a' to report the mappings.

(Sadly slabinfo is underdocumented. I wish it had a manpage or at least a README.)

If you need to track the memory usage of specific slab types, perhaps because you really want to know the memory usage of one subsystem, the easiest way is apparently to boot with the slub_nomerge kernel command line argument. Per the the kernel parameter documentation this turns off all slab merging, which may result in you having a lot more slabs than usual.

(On my workstation, slab merging condenses 110 different slabs into 14 actual slabs. On a random server, 170 slabs turn into 35 and a bunch of the pre-merger slabs are probably completely unused.)

Sidebar: disabling this merging in kernel code

The SLUB allocator does not directly expose a way of disabling this merging when you call kmem_cache_create() in that there's no 'do not merge, really' flag to the call. However, it turns out that supplying at least one of a number of SLUB debugging flags will disable this merging and on a kernel built without CONFIG_DEBUG_KMEMLEAK using SLAB_NOLEAKTRACE appears to have absolutely no other effects from what I can tell. Both Fedora 20 and Ubuntu 14.04 build their kernels without this option.

(I believe that most Linux distributions put a copy of the kernel build config in /boot when they install kernels.)

This may be handy if you have some additional kernel modules that you want to be able to track memory use for specifically even though a number of their slabs would normally get merged away, and you're compiling from source and willing to make some little modifications to it.

You can see the full set of flags that force never merging in the #define for SLUB_NEVER_MERGE in mm/slub.c. On a quick look, none of the others are either harmless or always defined as a non-zero value. It's possible that SLAB_DEBUG_FREE also does nothing these days; if used it will make your slabs only mergeable with other slabs that also specify it (which no slabs in the main kernel source do). That would cause slabs from your code to potentially be merged together but they wouldn't merge with anyone else's slabs, so at least you could track your subsystem's memory usage.

Disclaimer: these ideas have been at most compile-tested, not run live.

linux/SlabinfoSlabMerging written at 00:19:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.