2009-11-04
The cause of the multi-filesystem NFS export problem
There is a famous irritation with managing NFS filesystems which boils
down to that NFS clients have to know about your filesystem boundaries.
It goes like this: suppose that /home
and /home/group1
are separate
filesystems and you NFS export both of them. What you would like is
that clients NFS mount /home
and automatically get /home/group1
too, because this lets you transparently add /home/group2
next month.
However, this doesn't work (although some systems will try hard to fake
it if you tell them to).
(This issue is a lot more pertinent these days in light of things like ZFS, where filesystems are cheap objects.)
Although it superficially looks like the NFS re-export problem, the problem here isn't telling NFS filehandles for the different real filesystems apart. Provided that all of the filesystems can be NFS exported normally, your NFS server can just give out the same filehandles it would if the client had explicitly mounted the filesystems separately (the filehandle is opaque to the client, after all).
The real problem is what common NFS clients expect about the inode numbers; specifically, they expect the inode number to be unique in the client's view of the filesystem, and from the client's view it only mounted one filesystem. Meanwhile, on the server there are multiple filesystems and their inode numbers are almost certain to overlap. The result is explosions in some programs on the client under some circumstances, as the programs see duplicate inode numbers for files that are not actually hardlinks to each other.
(The client kernels generally don't care; the inode numbers that user programs see are unrelated to the NFS filehandles that the kernel uses.)
Technically this is a client side problem, but I doubt that any NFS client implementation actually gets it right. (And it is very hard to get right, since the client has to somehow make up unique yet ideally persistent inode numbers.)
(This is the kind of thing that I write down in part so that I can remember the logic the next time I wonder about it.)
Sidebar: the more subtle failures
Okay, that's not quite all that goes wrong if the server lets NFS
clients transparently cross filesystem boundaries, because there are
various operations that don't work across server filesystem boundaries
despite looking like they should on the client. For example, if
/home
on the client is all one single NFS mount, a program is
rationally entitled to believe that it can hardlink /home/fred/a
to
/home/group1/jim/b
. In practice this is going to fail with an error
because on the server that's a cross-filesystem hardlink.