Why high availability NFS requires shared storage
January 20, 2009
Suppose that you have a situation where you need transparent high
availability NFS for something that is read-only and updated only
infrequently. Instead of going to the expense and bother of setting
up real shared storage, it's tempting to try to implement this by
setting up a number of fileservers with local copies of the filesystem,
synchronizing it from a master machine with
Unfortunately, the tempting easy way doesn't work; you can't have transparent high availability unless you have shared storage or something that fakes it very well.
In order to have transparent HA, you need transparent failover. In order to have transparent failover, you need to keep the NFS filehandles the same. In almost all NFS implementations and filesystems, keeping the NFS filehandles the same requires keeping the inode numbers and generation counts of every file and directory exactly the same across all copies of the data.
No user-level tool can do this; there is no Unix interface to set the
inode number or the generation count when you create or manipulate a
file (okay, this is not quite true; at least some Linux filesystems have
a private interface to set the generation count of an inode, although
this still doesn't help with the inode numbers). So the inevitable
conclusion is that you must replicate your filesystem at some level
below the normal user level one that
The most general solution is shared storage, where you don't have to
replicate anything at all. If you absolutely can't do shared storage, I
can think of two general alternatives: synchronize the raw disk instead
of the filesystem (possibly still with
(Plausible methods of cross-network disk mirroring include Linux's DRBD and using an iSCSI or AOE target implementation on the replicas to export raw disks to the master.)
Comments on this page:
Written on 20 January 2009.
* * *
Atom feeds are available; see the bottom of most pages.