Not very much about Solaris NFS filehandles

January 27, 2011

A lot of Unix systems have NFS filehandles that are easy to recognize and decipher, at least to the extent of easily mapping filehandles to the server's filesystems (such information is very useful for things like troubleshooting just which server filesystem is seeing lots of NFS traffic from one particular client). Solaris does not.

The basic structure of a Solaris NFS v3 filehandle is as follows, at least as of Solaris 10 U8 or so:

(bytes) (what)
4 overall filehandle length in XDR byte order
4 fh3_fsid[0]
4 fh3_fsid[1]
2 fh3_len, which is at a minimum NFS_FHMAXDATA long
<variable> fh3_len bytes of fh3_data, padded with 0 bytes if necessary.
2 fh3_xlen, once again at least NFS_FHMAXDATA
<variable> fh3_xlen bytes of fh3_xdata , padded with 0 bytes if necessary.

(The overall filehandle is rounded up to be a multiple of 4 bytes long.)

All fields except the overall filehandle length are in host byte order, not network byte order. If you do not know the byte order of the Solaris fileserver you're working with, you already have problems.

(By the way, this implies that you cannot possibly do transparent failover between Solaris fileservers of different CPU architectures.)

The most interesting field is fh3_fsid, which identifies the server filesystem. I could talk about the inner structure of this, but at least for ZFS the practical answer is that you don't care; you need to directly extract the fsid from the NFS code with mdb and some parsing.

The raw fsid is printed by ::nfs_exptable as part of dumping the entire NFS export table and information:

echo ::nfs_exptable | mdb -k

With suitable parsing this will directly give you the fh3_fsid for every NFS-exported filesystem. On S10U8, the fsid is reported as follows:

fsid: (0x9d7c697f 0xbeb5c208)

(This is fh3_fsid[0] then fh3_fsid[1].)

Other versions of Solaris have reported this in somewhat different formats. Expect to spend a certain amount of effort to maintain your parser as new versions of Solaris have new mdb output.

The inner life of the fsid

The low byte of fh3_fsid[1] contains an identifier of the filesystem type; the remaining bits are filesystem type specific. For ZFS they are a 56-bit objset unique ID, with the low 32 bits in fh3_fsid[0] and the remaining bits in the high three bytes of fh3_fsid[1].

My skill at understanding Solaris kernel source is insufficient to establish if the filesystem type identifier is constant, and if so how constant it is. (Is it fixed permanently for ZFS, fixed per OS release, fixed per machine but variable from one machine to another, or variable if the system configuration changes? I can't understand the source environment enough to tell.)

As it happens it doesn't matter, because for ZFS there is basically no simple way of finding out the 56-bit objset unique ID; as far as I can tell there is simply no interface or even readily accessible data structure where it is visible.


Comments on this page:

From 149.77.31.154 at 2011-05-16 15:30:51:

So, I'm trying to use zfs send/receive to send a filesystem from one server to another and, through some tricks, move the IP address and have the 'recv' side assume control of the filesystem. For reasons of fsid, this appears to not be working...

I've found several places on the net that explain solaris nfs structure, but none appear to be complete. Yours is the closest for how the fsid comes in to play, after I realized I had to adjust for endianness, but the weird thing is that when you run snoop -v as in http://blogs.oracle.com/peteh/entry/understanding_snoop_1m_nfsv3_file, I have managed to get the file system id to be the same on both (as transmitted in the NFS reply portion of the packet:

e.g. NFS: File system id = 777389146169, File id = 3

However, the fsid[0] and fsid[1] are not the same, and I'm having a hard time understanding what the difference is or how the fsid is generated. All of the inodes are identical since they essentially the same filesystem. If I could figure this last part out, it would make filesystem migration so much simpler.

I wonder if you have any advice in this regard? Doug (hughesd at deshawresearch dot com)

By cks at 2011-05-16 17:28:08:

I would not assume that the inodes are the same unless I had checked that, because I am not sure that zfs send/receive preserves them exactly (on the other hand, it might well since it works at a very low level). But if you've already checked and they are the same, that's great news.

To see how the fsid is generated for ZFS filesystems, you will need to go through the kernel source. I did it once but forgot the details once I had determined that the objset unique ID was not something you could get at from user level (short of using mdb).

(I'm not sure that I'd believe snoop output, especially if mdb is telling you that the fsids are different on the two systems.)

Written on 27 January 2011.
« The various ways of writing a modern Python web app
On programming (and me) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jan 27 00:32:26 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.