ZFS can apparently start NFS fileservice before boot finishes

February 11, 2015

Here's something that I was surprised to discover the other day: ZFS can start serving things over NFS before the system is fully up. Unfortunately this can have a bad effect because it's possible for this NFS traffic to cause further ZFS traffic in some circumstances.

Since this sounds unbelievable, let me report what I saw first. As our problem NFS fileserver rebooted, it stalled reporting 'Reading ZFS config:'. At the same time, our iSCSI backends reported a high ongoing write volume to one pool's set of disks and snoop on the fileserver could see active NFS traffic. ptree reported that what was running at the time was the 'zfs mount -a' that is part of the /system/filesystem/local target.

(I recovered the fileserver from this problem by the simple method of disconnecting its network interface. This caused nlockmgr to fail to start, but at least the system was up. ZFS commands like 'zfs list' stalled during this stage; I didn't think to do a df to capture the actual mounts.)

Although I can't prove it from the source code, I have to assume that 'zfs mount -a' is enabling NFS access to filesystems as it mounts them. An alternate explanation is that /etc/dfs/sharetab had listings for all of the filesystems (ZFS adds them as part of sharing them over NFS) and this activated NFS service for filesystems as they appeared. The net effect is about the same.

This is obviously a real issue if you want your system to be fully up and running okay before any NFS fileservice starts. Since apparently some sorts of NFS traffic under some circumstances can stall further ZFS activity, well, this is something you may care about; we certainly do now.

In theory the SMF dependencies say that /network/nfs/server depends on /system/filesystem/local, as well as nlockmgr (which didn't start). In practice, well, how the system actually behaves is the ultimate proof and all I can do is report what I saw. Yes, this is frustrating. That ZFS and SMF together hide so much in black magic is a serious problem that has made me frustrated before. Among other things it means that when something goes odd or wrong you need to be a deep expert to understand what's going on.

Written on 11 February 2015.
« Our ZFS fileservers have a serious problem when pools hit quota limits
Good technical writing is not characterless and bland »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 11 01:41:42 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.