How we guarantee there's always some free space in our ZFS pools

May 10, 2020

One of the things that we discovered fairly early on in our experience with ZFS (I think within the lifetime of the first generation Solaris fileservers) is that ZFS gets very unhappy if you let a pool get completely full. The situation has improved since then, but back in the days we couldn't even change ZFS properties, much less remove files as root. Being unable to change properties is a serious issue for us because NFS exports are controlled by ZFS properties, so if we had a full pool we couldn't modify filesystem exports to cut off access from client machines that were constantly filling up the filesystem.

(At one point we resorted to cutting off a machine at the firewall, which is a pretty drastic step. Going this far isn't necessary for machines that we run, but we also NFS export filesystems to machines that other trusted sysadmins run.)

To stop this from happening, we use pool-wide quotas. No matter how much space people have purchased in a pool or even if this is a system pool that we operate, we insist that it always have a minimum safety margin, enforced through a 'quota=' setting on the root of the pool. When people haven't purchased enough to use all of the pool's current allocated capacity, this safety margin is implicitly the space they haven't bought. Otherwise, we have two minimum margins. The explicit minimum margin is that our scripts that manage pool quotas always insist on a 10 MByte safety margin. The implicit minimum margin is that we normally only set pool quotas in full GB, so a pool can be left with several hundred MB of space between its real maximum capacity and the nearest full GB.

All of this pushes the problem back one level, which is determining what the pool's actual capacity is so we can know where this safety margin is. This is relatively straightforward for us because all of our pools use mirrored vdevs, which means that the size reported by 'zpool list' is a true value for the total usable space (people with raidz vdevs are on their own here). However, we must reduce this raw capacity a bit, because ZFS reserves 1/32nd of the pool for its own internal use. We must reserve at least 10 MB over and above this 1/32nd of the pool in order to actually have a safety margin.

(All of this knowledge and math is embodied into a local script, so that we never have to do these calculations by hand or even remember the details.)

PS: These days in theory you can change ZFS properties and even remove files when your pool is what ZFS will report as 100% full. But you need to be sure that you really are freeing up space when you do this, not using more because of things like snapshots. Very bad things happen to your pool if it gets genuinely full right up to ZFS's internal redline (which is past what ZFS will normally let you unless you trick it); you will probably have to back it up, destroy it, and recreate it to fully recover.

(This entry was sparked by a question from a commentator on yesterday's entry on how big our fileserver environment is.)

Written on 10 May 2020.
« How big our fileserver environment is (as of May 2020)
Why we have several hundred NFS filesystems in our environment »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun May 10 22:51:20 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.