Wandering Thoughts archives

2023-03-21

ZFS on Linux and NFS(v3) server filesystem IDs

One part of a NFS filehandle is an identification of the filesystem (or more accurately the mount point) on the server. As I've seen recently there are various forms of (NFS) filesystem IDs, and they can even vary from client to client (although you shouldn't normally set things up that way). However, all of this still leaves an open question for ZFS on Linux filesystems in specific, which is where does the filesystem ID come from and how can you work it out, or see if two filesystem IDs are going to remain the same so you can substitute servers without invalidating client NFS mounts. As it happens I have just worked out the answer to that question, so here it is.

All ZFS filesystems (datasets) have two identifiers, a 'guid' that is visible as a dataset properly, and a special 'fsid_guid' (as zdb calls it), that is the 'fsid'. There are two ways to find out the fsid of a ZFS dataset. First, ZFS returns it to user level in the '((f_fsid)' field that's part of what's returned by statfs(2). Second, you can use 'zdb' to dump the objset object of a dataset, which you may need to do if the filesystem isn't mounted. You find which object you need to dump by getting the 'objsetid' property of a ZFS filesystem (well, dataset):

# zfs get objsetid fs6-mail-01/cs/mail
[...]
fs6-mail-01/cs/mail  objsetid  148       -
# zdb -dddd fs6-mail-01 148 | grep fsid_guid
      fsid_guid = 40860249729586731

For the statfs() version we can use Python, and conveniently report the result in hex for reasons we're about to see:

>>> import os
>>> r = os.statvfs("/w/435").f_fsid
>>> print("%x" % r)
34ae17341d08c

(These two approaches give the same answer for a given filesystem.)

In my earlier exploration of NFS server filesystem IDs, the NFS export 'uuid' of this /w/435 test filesystem was '7341d08c:00034ae1:00000000:00000000', which should look awfully familiar. It's the low 32-bit word and the high 32-bit word of the 'f_fsid', in that order, zero-padded. The reason for this reversal is somewhat obscure and beyond the scope of this entry (but it's probably this setting of the peculiar f_fsid field in zfs_statvfs()).

(This is the uuid visible in /proc/fs/nfsd/exports. As I discovered earlier, the version in /proc/net/rpc/nfsd.fh/content will be different.)

One important thing here is that a filesystem's fsid is not copied through ZFS send and receive, presumably because it's an invisible attribute that exists at the wrong level. This means that if you do ZFS fileserver upgrades by (filesystem) migration, your new fileserver will normally have ZFS filesystems with different ZFS fsids and thus different NFS filesystem IDs than your old one, and your NFS clients will get stale NFS handle errors. But at least you can now check this in advance if you want to verify that this is so. You can't work around this at the ZFS level, but you might be able to fix it at the NFS export level by setting an explicit 'uuid=' (of the old value) for all of the exports of the moved filesystem. Locally, we're just going to unmount and remount.

(I suspect that if you used 'zpool split' to split a pool the two copies of the pool would have filesystems with the same fsids and thus you could then do a migration from one to the other. But I've never even come near doing a ZFS pool split, so this is completely theoretical. For a server upgrade, presumably you'd use some sort of remote disk system like iSCSI or maybe DRBD to temporarily attach the new server's disks as additional mirrors, then split them off.)

linux/ZFSAndNFSFilesystemIDs written at 22:20:02; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.