== Where your memory can be going with ZFS on Linux If you're running ZFS on Linux, its memory use is probably at least [[a concern ZFSonLinuxWeakAreas]]. At a high level, there are at least three different places that your RAM may be being used or held down with ZoL. First, it may be in ZFS's ARC, which is the ZFS equivalent of the buffer cache. A full discussion of what is included in the ARC and how you measure it and so on is well beyond the scope of this entry, but the short summary is that the ARC includes data from disk, metadata from disk, and several sorts of bookkeeping data. ZoL reports information about it in _/proc/spl/kstat/zfs/arcstats_, which is exactly the standard ZFS ARC kstats. What ZFS considers to be the total current (RAM) size of the ARC is _size_. ZFS on Linux normally limits the maximum ARC size to roughly half of memory (this is _``c_max''_). (Some sources will tell you that the ARC size in kstats is _c_. This is wrong. _c_ is the target size; it's often but not always the same as the actual size.) Next, RAM can be in [[slab allocated http://en.wikipedia.org/wiki/Slab_allocation]] ZFS objects and data structures that are not counted as part of the ARC for one reason or another. It used to be that ZoL handled all slab allocation itself and so all ZFS slab things were listed in _/proc/spl/kmem/slab_, but the current ZoL development version now lets the native kernel slab allocator handle most slabs for objects that aren't bigger than ((spl_kmem_cache_slab_limit)) bytes, which is normally 16K by default. Such native kernel slabs are theoretically listed in _/proc/slabinfo_ but are unfortunately normally subject to [[SLUB slab merging SlabinfoSlabMerging]], which often means that they get merged with other slabs and you can't actually see how much memory they're using. As far as slab objects that aren't in the ARC, I believe that ((zfs_znode_cache)) slab objects (which are ((znode_t))s) are not reflected in the ARC size. On some machines active ((znode_t)) objects may be a not insignificant amount of memory. I don't know this for sure, though, and I'm somewhat reasoning from behavior we saw on Solaris. Third, RAM can be trapped in unused objects and space in slabs. One way that unused objects use up space (sometimes a lot of it) is that slabs are allocated and freed in relatively large chunks (at least one 4KB page of memory and often bigger in ZoL), so if only a few objects in a chunk are in use the entire chunk stays alive and can't be freed. [[We've seen serious issues with slab fragmentation on Solaris ../solaris/ZFSARCSizeProblem]] and I'm sure ZoL can have this too. It's possible to see the level of wastage and fragmentation for any slab that you can get accurate numbers for (ie, not any that have vanished into [[SLUB slab merging]]). (ZFS on Linux may also allocate some memory outside of its slab allocations, although I can't spot anything large and obvious in the kernel code.) All of this sounds really abstract, so let me give you an example. On one of my machines with 16 GB and actively used ZFS pools, things are currently reporting the following numbers: * the ARC is 5.1 GB, which is decent. Most of that is not actual file data, though; file data is reported as 0.27 GB, then there's 1.87 GB of ZFS metadata from disk and a bunch of other stuff. * 7.55 GB of RAM is used in active slab objects. 2.37 GB of that is reported in _/proc/spl/kmem/slab_; the remainder is in native Linux slabs in _/proc/slabinfo_. The ((znode_t)) slab is most of the SPL _slab_ report, at 2 GB used. (This machine is using a hack to avoid the SLUB slab merging for native kernel ZoL slabs, because I wanted to look at memory usage in detail.) * 7.81 GB of RAM has been allocated to ZoL slabs in total. This means that there is a few hundred MB of space wasted at the moment. If ((znode_t)) objects are not in the ARC, the ARC and active ((znode_t)) objects account for almost all of the slab space between the two of them; 7.1 GB out of 7.55 GB. I have seen total ZoL slab allocated space be as high as 10 GB (on this 16 GB machine) despite the ARC only reporting a 5 GB size. As you can see, this stuff can fluctuate back and forth during normal usage. === Sidebar: Accurately tracking ZoL slab memory usage To accurately track ZoL memory usage you must defeat SLUB slab merging somehow. You can turn it off entirely with the ((slub_nomerge)) kernel paramter or hack the _spl_ ZoL kernel module to defeat it (see the sidebar [[here SlabinfoSlabMerging]]). Because you can set ((spl_kmem_cache_slab_limit)) as a module parameter for the _spl_ ZoL kernel module, I believe that you can set it to zero to avoid having any ZoL slabs be native kernel slabs. This avoids SLUB slab merging entirely and also makes it so that all ZoL slabs appear in _/proc/spl/kmem/slab_. It may be somewhat less efficient.