== Why the NFS client is at fault in the multi-filesystem NFS problem In [[yesterday's entry MultiFilesystemNFSIssue]], I said that the NFS clients were at fault in dealing with the duplicate inode number problem. Now it's time for the details, because on first look this appears a bit odd; how can it be the client's responsibility to avoid duplicate inode numbers, when the server gives it the inode numbers? In the [[NFS v3 specification http://www.ietf.org/rfc/rfc1813.txt]], inode numbers only appear in one spot; they're part of the file attribute structure that the server returns for _GETATTR_ requests. While it is used for more than just _stat()_, _GETATTR_ is the NFS analog of the _stat()_ system call and the _fattr3_ structure that it returns is the analog of the kernel's _struct stat_ that _stat()_ fills in, and much the same information appears in both. In particular, the _fattr3_ structure has both a _fileid_ (the inode number) and a _fsid_, the 'file system identifier for [the file's] file system'. While NFS v3 requires that the inode number to be unique it only requires that it be unique within a single *server* filesystem, that is, for files with the same _fsid_. And an NFS server is free to give you files with different _fsid_s even though you have only made one NFS mount from it, of what you think is a single filesystem. The simple way for clients to map between _GETATTR_ and _stat()_ is to turn the _fileid_ into the inode number, fill in ((st_dev)) based on some magic internal number you're using for this NFS mount, and throw away the _fsid_. A kernel that does this has the duplicate inode number problem. Unfortunately, fixing this is complicated. The NFS client cannot simply use the _fsid_ for ((st_dev)), because ((st_dev)) must be unique *on the local machine* and the _fsid_ comes from the server; thus, it can potentially collide both with local filesystems and with filesystems from other NFS servers. Using _fsid_ at all in the _stat()_ results requires somehow inventing a relatively persistent and unique ((st_dev)) value for every different _fsid_ that every NFS server gives you, which is non-trivial. (If you have a very big ((st_dev)) you can deal with the problem by mangling the _fsid_ together with a unique local number for this NFS mount. But _fsid_ is a 64-bit number, so you'd need a pretty epic ((st_dev)).) === Sidebar: the Linux solution to this problem The Linux NFS client has a creative solution to this problem: it actually creates new NFS-mounted filesystems on the fly, complete with new local ((st_dev)) values, every time you traverse through a point where the _fsid_ changes. Comments in the source code say that this has the side effect of making _df_ work correctly, at least as long as you are not dealing with [[something like ZFS ../solaris/ZFSNFSOddDfExplained]].