2019-01-19
A surprise potential gotcha with sharenfs
in ZFS on Linux
In Solaris and Illumos, the standard and well supported way to set
and update NFS sharing options for ZFS filesystems is through the
sharenfs
ZFS filesystem property. ZFS on Linux sort of supports sharenfs
, but it
attempts to be compatible with Solaris
and in practice that doesn't work well, partly because there are
Solaris options that cannot be easily translated to Linux. When we faced this issue for our Linux ZFS
fileservers, we decided that we would build
an entirely separate system to handle NFS exports that directly invokes exportfs
,
which has worked well. This turns out to have been lucky, because
there is an additional and somewhat subtle problem with how sharenfs
is currently implemented in ZFS on Linux.
On both Illumos and Linux, ZFS actually implements sharenfs
by
calling the existing normal command to manipulate NFS exports; on
Illumos this uses share_nfs
and on Linux, exportfs
.
By itself this is not a problem and actually makes a lot of sense
(especially since there's no official public API for this on either
Linux or Illumos). On Linux, the specific functions involved are
found in lib/libshare/nfs.c.
When you initially share a NFS filesystem, ZFS will wind up running
the following command for each client:
exportfs -i -o <options> <client>:<path>
When you entirely unshare a NFS filesystem, ZFS will wind up running:
exportfs -u <client>:<path>
The potential problem comes in when you change an existing sharenfs
setting, either to modify what clients the filesystem is exported
to or to alter what options you're exporting it with. ZFS on Linux
implements this by entirely unexporting the filesystem to all
clients, then re-exporting it with whatever options and to whatever
clients your new sharenfs
settings call for.
(The code for this is in nfs_update_shareopts()
in
lib/libshare/nfs.c.)
On the one hand this is a sensible if brute force implementation,
and computing the difference in sharing (for both clients and
options) and how to transform one to the other is not an easy
problem. On the other hand, this means that clients that are actually
doing NFS traffic during the time when you change sharenfs
may
be unlucky enough to try a NFS operation in the window of time
between when the filesystem was unshared (to them) and when it was
reshared (to them). If they hit this window, they'll get various
forms of NFS permission denied messages, and with some clients this
may produce highly undesirably consequences, such as libvirt
guests having their root filesystems go read-only.
(The zfs-discuss re-query from Todd Pfaff today is what got several people to go digging and figure out this issue. I was one of them, but only because I rushed into exploring the code before reading the entire email thread.)
I would like to say that our system for ZFS NFS export permissions avoids this issue, but it has exactly
the same problem. Rather than try to reconcile the current NFS
export settings and the desired new ones, it just does a brute force
'exportfs -u
' for all current clients and then reshares things.
Fortunately we only very rarely change the NFS exports for a
filesystem because we export to netgroups instead of individual
clients, so adding and removing individual clients is almost entirely
done by changing netgroup membership. The actual exportfs
setting
only has to change if we add or remove entire netgroups.
(Exportfs has a tempting '-r
' option to just resynchronize everything,
but our current system doesn't use it and I don't know why. I know that
I poked around with exportfs
when I was developing it but I don't
seem to have written down notes about my exploration, so I don't know
if I ran into problems with -r
, didn't notice it, or had some other
reason I rejected it. If I didn't overlook it, this is definitely a case
where I should have documented why I wasn't doing an attractive thing.)