Wandering Thoughts archives

2014-10-25

The difference in available pool space between zfs list and zpool list

For a while I've noticed that 'zpool list' would report that our pools had more available space than 'zfs list' did and I've vaguely wondered about why. We recently had a very serious issue due to a pool filling up, so suddenly I became very interested in the whole issue and did some digging. It turns out that there are two sources of the difference depending on how your vdevs are set up.

For raidz vdevs, the simple version is that 'zpool list' reports more or less the raw disk space before the raidz overhead while 'zfs list' applies the standard estimate that you expect (ie that N disks worth of space will vanish for a raidz level of N). Given that raidz overhead is variable in ZFS, it's easy to see why the two commands are behaving this way.

In addition, in general ZFS reserves a certain amount of pool space for various reasons, for example so that you can remove files even when the pool is 'full' (since ZFS is a copy on write system, removing files requires some new space to record the changes). This space is sometimes called 'slop space'. According to the code this reservation is 1/32nd of the pool's size. In my actual experimentation on our OmniOS fileservers this appears to be roughly 1/64th of the pool and definitely not 1/32nd of it, and I don't know why we're seeing this difference.

(I found out all of this from a Ben Rockwood blog entry and then found the code in the current Illumos codebase to see what the current state was (or is).)

The actual situation with what operations can (or should) use what space is complicated. Roughly speaking, user level writes and ZFS operations like 'zfs create' and 'zfs snapshot' that make things should use the 1/32nd reserved space figure, file removes and 'neutral' ZFS operations should be allowed to use half of the slop space (running the pool down to 1/64th of its size), and some operations (like 'zfs destroy') have no limit whatever and can theoretically run your pool permanently and unrecoverably out of space.

The final authority is the Illumos kernel code and its comments. These days it's on Github so I can just link to the two most relevant bits: spa_misc.c's discussion of spa_slop_shift and dsl_synctask.h's discussion of zfs_space_check_t.

(What I'm seeing with our pools would make sense if everything was actually being classified as a 'allowed to use half of the slop space' operation. I haven't traced the Illumos kernel code at this level so I have no idea how this could be happening; the comments certainly suggest that it isn't supposed to be.)

(This is the kind of thing that I write down so I can find it later, even though it's theoretically out there on the Internet already. Re-finding things on the Internet can be a hard problem.)

solaris/ZFSSpaceReportDifference written at 02:05:09; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.