ZFS on Linux's sharenfs problem (of what you can and can't put in it)

July 13, 2018

ZFS has an idea of 'properties' for both pools and filesystems. To quote from the Illumos zfs manpage:

Properties are divided into two types, native properties and user-defined (or "user") properties. Native properties either export internal statistics or control ZFS behavior. [...]

Filesystem properties are used to control things like whether compression is on, where a ZFS filesystem is mounted, if it is read-only or not, and so on. One of those properties is called sharenfs; it controls whether or not the filesystem is NFS exported, and what options it's exported with. One of the advantages of having ZFS manage this for you through the sharenfs property is that ZFS will automatically share and unshare things as the ZFS pool and filesystem are available or not available; you don't have to try to coordinate the state of your NFS shares and your ZFS filesystem mounts.

As I write this, the current ZFS on Linux zfs manpage says this about sharenfs:

Controls whether the file system is shared via NFS, and what options are to be used. [...] If the property is set to on, the dataset is shared using the default options:

sec=sys,rw,crossmnt,no_subtree_check,no_root_squash

See exports(5) for the meaning of the default options. Otherwise, the exportfs(8) command is invoked with options equivalent to the contents of this property.

That's very interesting wording. It's also kind of a lie, because ZFS on Linux caught itself in a compatibility bear trap (or so I assume).

This wording is essentially the same as the wording in Illumos (and in the original Solaris manpages). On Solaris, the sharenfs property is passed more or less straight to share_nfs as the NFS share options in its -o argument, and as a result what you put in sharenfs is just those options. This makes sense; the original Solaris version of ZFS was not created to be portable to other Unixes, so it made no attempt to have its sharenfs (or sharesmb) be Unix-independent. It was part of Solaris, so what went into sharenfs was Solaris NFS share options, including obscure ones.

It would have been natural of ZFS on Linux to take the same attitude towards what went into sharenfs on Linux, and indeed the current wording of the manpage sort of implies that this is what's happening and that you can simply use what you'd put in exports(5). Unfortunately, this is not the case. Instead, ZFS on Linux attempts to interpret your sharenfs setting as OmniOS NFS share options and tries to convert them to equivalent Linux options.

(I assume that this was done to make it theoretically easier to move pools and filesystems between ZoL and Illumos/Solaris ZFS, because the sharenfs property would mean the same thing and be interpreted the same way on both systems. Moving filesystems back and forth is not as crazy as it sounds, given zfs send and zfs receive.)

There are two problems with this. The first is that the conversion process doesn't handle all of the Illumos NFS share options. Some it will completely reject or fail on (they're just totally unsupported), while others it will accept but produce incorrect conversions that don't work. The set of accepted and properly handled conversions is not documented and is unlikely to ever be. The second problem is that Linux can do things with NFS share options that Illumos doesn't support (the reverse is true too, but less directly relevant). Since ZFS on Linux provides you no way to directly set Linux share options, you can't use these Linux specific NFS share options at all through sharenfs.

Effectively what the current ZFS on Linux approach does is that it restricts you to an undocumented subset of the Illumos NFS share options are supported by Linux and correctly converted by ZoL. If you're doing anything at all sophisticated with your NFS sharing options (as we are), this means that using sharenfs on Linux is simply not an option. We're going to have to roll our own NFS share option handling and management system, which is a bit irritating.

(We're also going to have to make sure that we block or exclude sharenfs properties from being transferred from our OmniOS fileservers to our ZoL fileservers during 'zfs send | zfs receive' copies, which is a problem that hadn't occurred to me until I wrote this entry.)

PS: There is an open ZFS on Linux issue to fix the documentation; it includes mentions of some mis-parsing sharenfs bugs. I may even have the time and energy to contribute a patch at some point.

PPS: Probably what we should do is embed our Linux NFS share options as a ZFS filesystem user property. This would at least allow our future management system to scan the current ZFS filesystems to see what the active NFS shares and share options should be, as opposed to having to also consult and trust some additional source of information for that.


Comments on this page:

I spent some time noodling in my head on how this should look.

The obvious first idea (to me) is to just add another property linsharenfs (or something) that is used on Linux – basically doing what you decided to do with a user property, just with the benefit of ZFS handling the NFS sharing for you, same as it does for the sharenfs property, instead of you having to implement that yourself.

It’s a bit of a clumsy solution though. It at least means all ZFS implementations on all platforms have to know about each others’ NFS property names in order to ignore them, and people who share filesystems across platforms have to juggle what gets set where, filtering `zfs send` etc. As well, separating the options by platform is a weird level of granularity that seems plausibly problematic for the future. ZFS getting ported at all already created an unanticipated uncertainty about the meaning of what a platform is. If we get, say, a new BSD with a new kernel in the future, should ZFS there use the foosharenfs property of the BSD it was forked from or create its own? Both choices come with drawbacks.

So eventually it occurred to me that it should just be configurable what ZFS calls to set up its NFS shares. Instead of being hardwired to share_nfs on Solaris or attempting to call the equivalent with translated options on some other platform, it should have an option that only defaults to the currently called utility but can be changed to something else. Then the Linux implementation would default to a custom tool it ships, and the option translation code would live in that tool instead of in ZFS core. That way if a user only runs Linux and needs to do something that the option translator doesn’t understand or support, the user can just skip the translator. I also presume that people who use ZFS and NFS together would have a much easier time supplying patches for this tool than for the ZFS core, in which case it would likely get much supported option coverage over time, which would also help. This new option could also be used by deployments with special needs to implement completely arbitrary NFS attribute handling schemes, even back on Solaris. (Pass an identifier that’s looked up in some configuration management system? No problem. Look it up in LDAP? Sure… why not.) Users who truly share filesystems across platforms would have the ability to implement whatever scheme best fits their particular needs, and it avoids the granularity and management problems with multiple platform-specific properties.

Written on 13 July 2018.
« You should probably write down what your math actually means
The challenge of storing file attributes on disk »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jul 13 01:03:29 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.