== The ultimate (for now) answer for our ZFS ARC size problem I've mentioned in passing before ([[here ZFSOverPrefetchingUpdateII]] and [[here ZFSOverPrefetchingUpdate]]) that we have had a long-standing problem where our ZFS ARC sizes would basically collapse; the ARC would spontaneously decide to limit itself to 2 GBytes or so despite the machines having 8 GBytes and being basically unused apart from NFS fileservice. In the end, I believe I've figured out why this happened to us. The short answer is kernel memory fragmentation. (At this point I will pause to mention that we are running more or less Solaris 10 update 8, because this is about to become important.) Simplifying somewhat, Solaris allocates most kernel memory structures using an arena-based [[slab allocator http://en.wikipedia.org/wiki/Slab_allocation]]; common sorts of objects have their own separate arenas. As with all slab allocators, the memory system can only return slab pages to the free pool if *all* objects on a particular page are free; even a single object still used will cause an entire page to be retained. ZFS has an arena for ((dnode_t)) structures, which are the rough ZFS equivalent of inodes. On the Solaris fileservers with the ARC size collapse, Solaris kernel stats show that this arena has very low utilization; 16% of the allocated ((dnode_t))'s being used is typical. Since Solaris is unable to reduce the size of this arena, I think it must be heavily fragmented. This leaves us with two puzzles: what's causing the arena to grow, and what's keeping a random scattering of ((dnode_t)) structures busy. I have a potential answer for the first puzzle; as it happens, we have a number of periodic jobs that walk all of the ZFS filesystems on a fileserver, and when they're running the ((dnode_t)) arena utilization climbs dramatically. I have no answer for the second puzzle right now (and haven't looked very hard for one). There is code in OpenSolaris to support defragmenting arenas by moving allocated objects between slab pages (with the cooperation of the owner of the objects). However, this code is not in Solaris 10 update 8 (and I don't know if it's in S10U9 either, or even Solaris 11 Express).