2023-03-09
ZFS on Linux and when you get stale NFSv3 mounts
Suppose that you have ZFS based NFS servers that you're changing from Ubuntu 18.04 to 22.04. These servers have a lot of NFS exported filesystems that are mounted and used by a lot of clients, so it would be very convenient if you could upgrade the ZFS fileservers without having to unmount and remount the filesystems on all of your clients. Conversely, if a particular way of moving from 18.04 to 22.04 is going to require you to unmount all of its filesystems, you'd like to know that in advance so you can prepare for it, rather than find out after the fact when clients start getting 'stale NFS handle' errors. Since we've just been through some experiences with this, I'm going to write down what we've observed.
There are at least three ways to move a ZFS fileserver from Ubuntu 18.04 to Ubuntu 22.04. I'll skip upgrading it in place because we don't have any experience with that; we upgrade machines by reinstalling them from scratch. That leaves two approaches for a ZFS server, which I will call a forklift upgrade and a migration. In a forklift upgrade, you build new system disks, then swap them in by exporting the ZFS pools, changing system disks, booting your new 22.04 system, and importing the pools back.
(As a version of the forklift upgrade you can reuse your current system disks, although this means you can't readily revert.)
Our experience with these in place 'export pools, swap system disks, import pools' forklift upgrades is that client NFSv3 mounts survive over them. Your NFS clients will stall while your ZFS NFS server goes away for a while, but once it's back (under the right host name and IP address), they resume their activities and things pick right back up where they were. We've also had no problems with ZFS pools when we reboot our servers with changed hostnames; changing the server's hostname doesn't cause ZFS on Linux to not bring the pools up on boot.
However, forklift upgrades can only be done on ZFS fileservers where you have separate system disks and ZFS pool disks. We have one fileserver where this isn't possible; it has only four disks and shares all of them between system filesystems and its ZFS pool. For this machine we did a migration, where we built a new version of the system using new disks on new hardware, then moved the ZFS data over with ZFS snapshots (as I thought we might have to). Once the data was migrated, we shut down the old server and made the new hardware take over the name, IP address, and so on.
Unfortunately for us, when we did this migration, NFS clients got
stale NFS mounts. The new version of this fileserver had the same
filesystem with the exact same contents (ZFS snapshots and snapshot
replication insures that), the same exports, and so on, but the
NFS filehandles came out different.
It's possible that we could have worked around this if we had set
an explicit 'fsid=
' value in our NFS export for the filesystem
(as per exports(5)
), but it's
also possible that there were other differences in the NFS filehandle.
(ZFS has a notion of a 'fsid' and a 'guid' for ZFS filesystems (okay, datasets), and zdb can in theory dump this information, but right now I can't work out how to go from a filesystem name in a pool to reading out its ZFS fsid, so I can't see if it's preserved over ZFS snapshot replication or if the receiver generates a new one.)