2020-06-30
The unfortunate limitation in ZFS filesystem quotas and refquota
When ZFS was new, the only option it had for filesystems quotas was
the quota
property, which I had an issue with
and which caused us practical problems in our first generation
of ZFS fileservers because it covered the space
used by snapshots as well as the regular user accessible filesystem.
Later ZFS introduced the refquota
property, which did not have
that problem but in exchange doesn't apply to any descendant datasets
(regardless of whether they're snapshots or regular filesystems).
At one level this issue with refquota
is fine, because we put
quotas on filesystems to limit their maximum size to what our backup
system can comfortably handle. At another level, this issue impacts
how we operate.
All of this stems from a fundamental lack in ZFS quotas, which is
ZFS's general quota system doesn't let you limit space used only
by unprivileged operations. Writing into a filesystem is a normal
everyday thing that doesn't require any special administrative
privileges, while making ZFS snapshots (and clones) requires special
administrative privileges (either from being root
or from having
had them specifically delegated to you). But you can't tell them
apart in a hierarchy, because ZFS only you offers the binary choice
of ignoring all space used by descendants (regardless of how it
occurs) or ignoring none of it, sweeping up specially privileged
operations like creating snapshots with ordinary activities like
writing files.
This limitation affects our pool space limits, because we use them
for two different purposes; restricting people to only the space
that they've purchased and insuring
that pools always have a safety margin of space.
Since pools contain many filesystems,
we must limit their total space usage using the quota
property.
But that means that any snapshots we make for administrative purposes
consume space that's been purchased, and if we make too many of
them we'll run the pool out of space for completely artificial
reasons. It would be better to be able to have two quotas, one for
the space that the group has purchased (which would limit only
regular filesystem activity) and one for our pool safety margin
(which would limit snapshots too).
(This wouldn't completely solve the problem, though, since snapshots still consume space and if we made too many of them we'd run a pool that should have free space out of even its safety margin. But it would sometimes make things easier.)
PS: I thought this had more of an impact on our operations and the features we can reasonable offer to people, but the more I think about it the more it doesn't. Partly this is because we don't make much use of snapshots, though, for various reasons that sort of boil down to 'the natural state of disks is usually full'. But that's for another entry.