The basic implementation of relatively high-availability NFS

January 18, 2009

There's a lot of complicated approaches for high availability services in general. Fortunately, NFS's statelessness makes it easy to do a relatively simple, low-rent version of this.

First, make your fileserver names point to virtual IP addresses. Embody these virtual IPs on the physical machines of your choice that have some form of shared storage. To fail over a (virtual) fileserver, take down the IP address on the original machine, unshare and unmount everything, mount and share it on the new machine, and finally bring up the virtual IP address as (yet another) interface alias on the new machine.

(If the first machine crashes or otherwise goes down hard, skip the first bits. The use of various techniques to force a machine down is optional but potentially recommended.)

The one issue that we've run into with this scheme that can't be attributed to operating system bugs is that it's not clear how NFS locks get cleanly handled, especially if the original machine is hosting multiple virtual fileservers so that you can't just shoot the locking daemon and remove all of its state.

(You may also run into operating system bugs and limitations, as we have, which is why our current HA NFS server setup remains mostly theoretical.)

This is merely relatively high availability, because it may take a fair amount of time to bring the filesystems up on the new machine and export them, and then for all of the client machines to acquire the virtual fileserver's new Ethernet address (and re-establish TCP NFS connections, if you're using NFS over TCP). If you need things to go faster, you'll need to elaborate this basic outline in various ways.

(However, without a suitable cluster file system underlying your NFS fileservers I think that there's a definite maximum failover speed you'll ever achieve. If the filesystems are not live and already exported on each NFS server, you'll have to mount and export them, and that plain takes time, especially if you have a lot.)

Written on 18 January 2009.
« Practical issues with getting ZFS on Linux
Using iptables to get around the policy based routing limitation »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jan 18 01:25:38 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.