2011-01-27
Not very much about Solaris NFS filehandles
A lot of Unix systems have NFS filehandles that are easy to recognize and decipher, at least to the extent of easily mapping filehandles to the server's filesystems (such information is very useful for things like troubleshooting just which server filesystem is seeing lots of NFS traffic from one particular client). Solaris does not.
The basic structure of a Solaris NFS v3 filehandle is as follows, at least as of Solaris 10 U8 or so:
| (bytes) | (what) |
| 4 | overall filehandle length in XDR byte order |
| 4 | fh3_fsid[0] |
| 4 | fh3_fsid[1] |
| 2 | fh3_len, which is at a minimum NFS_FHMAXDATA long |
| <variable> | fh3_len bytes of fh3_data, padded with 0 bytes
if necessary. |
| 2 | fh3_xlen, once again at least NFS_FHMAXDATA |
| <variable> | fh3_xlen bytes of fh3_xdata , padded with 0
bytes if necessary. |
(The overall filehandle is rounded up to be a multiple of 4 bytes long.)
All fields except the overall filehandle length are in host byte order, not network byte order. If you do not know the byte order of the Solaris fileserver you're working with, you already have problems.
(By the way, this implies that you cannot possibly do transparent failover between Solaris fileservers of different CPU architectures.)
The most interesting field is fh3_fsid, which identifies the server
filesystem. I could talk about the inner structure of this, but at least
for ZFS the practical answer is that you don't care; you need to directly
extract the fsid from the NFS code with mdb and some parsing.
The raw fsid is printed by ::nfs_exptable as part of dumping the
entire NFS export table and information:
echo ::nfs_exptable | mdb -k
With suitable parsing this will directly give you the fh3_fsid
for every NFS-exported filesystem. On S10U8, the fsid is reported
as follows:
fsid: (0x9d7c697f 0xbeb5c208)
(This is fh3_fsid[0] then fh3_fsid[1].)
Other versions of Solaris have reported this in somewhat different formats. Expect to spend a certain amount of effort to maintain your parser as new versions of Solaris have new mdb output.
The inner life of the fsid
The low byte of fh3_fsid[1] contains an identifier of
the filesystem type; the remaining bits are filesystem type specific.
For ZFS they are a 56-bit objset unique ID, with the low 32 bits in
fh3_fsid[0] and the remaining bits in the high three bytes of
fh3_fsid[1].
My skill at understanding Solaris kernel source is insufficient to establish if the filesystem type identifier is constant, and if so how constant it is. (Is it fixed permanently for ZFS, fixed per OS release, fixed per machine but variable from one machine to another, or variable if the system configuration changes? I can't understand the source environment enough to tell.)
As it happens it doesn't matter, because for ZFS there is basically no simple way of finding out the 56-bit objset unique ID; as far as I can tell there is simply no interface or even readily accessible data structure where it is visible.
2011-01-21
A bit more on listing file locks on Solaris 10
To follow up on my earlier entry on this, it's
amazing what you discover when you take the time go through mdb's full
online help. In particular, if you have an NFS fileserver you can easily
see all NFS locks that it knows about, complete with the remote hold and
even the full filename.
The basic mdb -k command we want is ::nlm_lockson, which does
what you might expect from the name. The state field values that I
know about are '3' for active locks and '4' for attempted locks that
are currently blocked. If there is a specific lock that you want path
information on, take the vnode address and use the mdb command line:
<addr> ::print vnode_t v_path
If you want the paths for most or all of the locks, use ::nlm_lockson
-v instead of the plain version. This will also tell you various sorts
of other additional information, including decoding the numeric state
field to human-readable values.
The help for ::nlm_lockson claims that you can get it to report on
only a single remote host. I've never been able to get this to work,
but I don't care very much since you can always pipe its output through
grep or the like. (I'm probably missing something.)
The other two NLM commands documented in ::nfs_help -d don't seem
to report anything that's useful for sysadmins. It's possible that I'm
missing something; full understanding of this stuff is difficult without
access to the kernel source, and the NLM modules are one of the pieces
of Solaris that are still closed source and available only as binaries.