Why the NFS client is at fault in the multi-filesystem NFS problem
In yesterday's entry, I said that the NFS clients were at fault in dealing with the duplicate inode number problem. Now it's time for the details, because on first look this appears a bit odd; how can it be the client's responsibility to avoid duplicate inode numbers, when the server gives it the inode numbers?
In the NFS v3 specification,
inode numbers only appear in one spot; they're part of the file
attribute structure that the server returns for GETATTR
requests.
While it is used for more than just stat()
, GETATTR
is the NFS
analog of the stat()
system call and the fattr3
structure that it
returns is the analog of the kernel's struct stat
that stat()
fills
in, and much the same information appears in both.
In particular, the fattr3
structure has both a fileid
(the inode
number) and a fsid
, the 'file system identifier for [the file's] file
system'. While NFS v3 requires that the inode number to be unique it
only requires that it be unique within a single server filesystem,
that is, for files with the same fsid
. And an NFS server is free to
give you files with different fsid
s even though you have only made one
NFS mount from it, of what you think is a single filesystem.
The simple way for clients to map between GETATTR
and stat()
is to
turn the fileid
into the inode number, fill in st_dev
based on
some magic internal number you're using for this NFS mount, and throw
away the fsid
. A kernel that does this has the duplicate inode number
problem.
Unfortunately, fixing this is complicated. The NFS client cannot simply
use the fsid
for st_dev
, because st_dev
must be unique on
the local machine and the fsid
comes from the server; thus, it can
potentially collide both with local filesystems and with filesystems
from other NFS servers. Using fsid
at all in the stat()
results
requires somehow inventing a relatively persistent and unique st_dev
value for every different fsid
that every NFS server gives you, which
is non-trivial.
(If you have a very big st_dev
you can deal with the problem by
mangling the fsid
together with a unique local number for this NFS
mount. But fsid
is a 64-bit number, so you'd need a pretty epic
st_dev
.)
Sidebar: the Linux solution to this problem
The Linux NFS client has a creative solution to this problem: it
actually creates new NFS-mounted filesystems on the fly, complete
with new local st_dev
values, every time you traverse through
a point where the fsid
changes. Comments in the source code say
that this has the side effect of making df
work correctly, at
least as long as you are not dealing with something like ZFS.
|
|