The ultimate (for now) answer for our ZFS ARC size problem
I've mentioned in passing before (here and here) that we have had a long-standing problem where our ZFS ARC sizes would basically collapse; the ARC would spontaneously decide to limit itself to 2 GBytes or so despite the machines having 8 GBytes and being basically unused apart from NFS fileservice. In the end, I believe I've figured out why this happened to us. The short answer is kernel memory fragmentation.
(At this point I will pause to mention that we are running more or less Solaris 10 update 8, because this is about to become important.)
Simplifying somewhat, Solaris allocates most kernel memory structures using an arena-based slab allocator; common sorts of objects have their own separate arenas. As with all slab allocators, the memory system can only return slab pages to the free pool if all objects on a particular page are free; even a single object still used will cause an entire page to be retained.
ZFS has an arena for dnode_t
structures, which are the rough ZFS
equivalent of inodes. On the Solaris fileservers with the ARC size
collapse, Solaris kernel stats show that this arena has very low
utilization; 16% of the allocated dnode_t
's being used is typical.
Since Solaris is unable to reduce the size of this arena, I think it
must be heavily fragmented.
This leaves us with two puzzles: what's causing the arena to grow, and
what's keeping a random scattering of dnode_t
structures busy. I
have a potential answer for the first puzzle; as it happens, we have
a number of periodic jobs that walk all of the ZFS filesystems on a
fileserver, and when they're running the dnode_t
arena utilization
climbs dramatically. I have no answer for the second puzzle right now
(and haven't looked very hard for one).
There is code in OpenSolaris to support defragmenting arenas by moving allocated objects between slab pages (with the cooperation of the owner of the objects). However, this code is not in Solaris 10 update 8 (and I don't know if it's in S10U9 either, or even Solaris 11 Express).
|
|