Wandering Thoughts archives

2009-01-18

The basic implementation of relatively high-availability NFS

There's a lot of complicated approaches for high availability services in general. Fortunately, NFS's statelessness makes it easy to do a relatively simple, low-rent version of this.

First, make your fileserver names point to virtual IP addresses. Embody these virtual IPs on the physical machines of your choice that have some form of shared storage. To fail over a (virtual) fileserver, take down the IP address on the original machine, unshare and unmount everything, mount and share it on the new machine, and finally bring up the virtual IP address as (yet another) interface alias on the new machine.

(If the first machine crashes or otherwise goes down hard, skip the first bits. The use of various techniques to force a machine down is optional but potentially recommended.)

The one issue that we've run into with this scheme that can't be attributed to operating system bugs is that it's not clear how NFS locks get cleanly handled, especially if the original machine is hosting multiple virtual fileservers so that you can't just shoot the locking daemon and remove all of its state.

(You may also run into operating system bugs and limitations, as we have, which is why our current HA NFS server setup remains mostly theoretical.)

This is merely relatively high availability, because it may take a fair amount of time to bring the filesystems up on the new machine and export them, and then for all of the client machines to acquire the virtual fileserver's new Ethernet address (and re-establish TCP NFS connections, if you're using NFS over TCP). If you need things to go faster, you'll need to elaborate this basic outline in various ways.

(However, without a suitable cluster file system underlying your NFS fileservers I think that there's a definite maximum failover speed you'll ever achieve. If the filesystems are not live and already exported on each NFS server, you'll have to mount and export them, and that plain takes time, especially if you have a lot.)

sysadmin/BasicHANFS written at 01:25:38; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.