2023-03-21
ZFS on Linux and NFS(v3) server filesystem IDs
One part of a NFS filehandle is an identification of the filesystem (or more accurately the mount point) on the server. As I've seen recently there are various forms of (NFS) filesystem IDs, and they can even vary from client to client (although you shouldn't normally set things up that way). However, all of this still leaves an open question for ZFS on Linux filesystems in specific, which is where does the filesystem ID come from and how can you work it out, or see if two filesystem IDs are going to remain the same so you can substitute servers without invalidating client NFS mounts. As it happens I have just worked out the answer to that question, so here it is.
All ZFS filesystems (datasets) have two identifiers, a 'guid
' that
is visible as a dataset properly, and a special 'fsid_guid
' (as
zdb
calls it), that is the 'fsid'. There are two ways to find out the
fsid of a ZFS dataset. First, ZFS returns it to user level in the
'((f_fsid)' field that's part of what's returned by statfs(2)
. Second, you
can use 'zdb
' to dump the objset object of a dataset, which you
may need to do if the filesystem isn't mounted. You find which
object you need to dump by getting the 'objsetid
'
property of a ZFS filesystem (well, dataset):
# zfs get objsetid fs6-mail-01/cs/mail [...] fs6-mail-01/cs/mail objsetid 148 - # zdb -dddd fs6-mail-01 148 | grep fsid_guid fsid_guid = 40860249729586731
For the statfs() version we can use Python, and conveniently report the result in hex for reasons we're about to see:
>>> import os >>> r = os.statvfs("/w/435").f_fsid >>> print("%x" % r) 34ae17341d08c
(These two approaches give the same answer for a given filesystem.)
In my earlier exploration of NFS server filesystem IDs, the NFS export 'uuid
' of this /w/435
test filesystem was '7341d08c:00034ae1:00000000:00000000', which
should look awfully familiar. It's the low 32-bit word and the
high 32-bit word of the 'f_fsid
', in that order, zero-padded.
The reason for this reversal is somewhat obscure and beyond the
scope of this entry (but it's probably this setting of the peculiar
f_fsid field
in zfs_statvfs()).
(This is the uuid visible in /proc/fs/nfsd/exports. As I discovered earlier, the version in /proc/net/rpc/nfsd.fh/content will be different.)
One important thing here is that a filesystem's fsid is not copied
through ZFS send and receive, presumably because it's an invisible
attribute that exists at the wrong level. This means that if you do
ZFS fileserver upgrades by (filesystem) migration, your new fileserver will normally
have ZFS filesystems with different ZFS fsids and thus different
NFS filesystem IDs than your old one, and your NFS clients will get
stale NFS handle errors. But at least you can now check this in
advance if you want to verify that this is so. You can't work around
this at the ZFS level, but you might be able to fix it at the NFS
export level by setting an explicit 'uuid=
' (of the old value)
for all of the exports of the moved filesystem. Locally, we're just
going to unmount and remount.
(I suspect that if you used 'zpool split
' to
split a pool the two copies of the pool would have filesystems with
the same fsids and thus you could then do a migration from one to
the other. But I've never even come near doing a ZFS pool split,
so this is completely theoretical. For a server upgrade, presumably
you'd use some sort of remote disk system like iSCSI or maybe DRBD to
temporarily attach the new server's disks as additional mirrors,
then split them off.)