Why we do NFS fileserving with a SAN

February 18, 2007

Our storage infrastructure here has a number of NFS servers sitting in front of a pool of SAN RAID storage boxes using commodity SATA disks. This is a somewhat unusual setup for a comparatively small environment like ours; a far more common setup is to have the disks directly attached to the fileservers.

We have a SAN setup for a simple reason: failover between the NFS server machines. We consider the server machines to be the things most likely to suffer failures, either hardware or software, or just to need downtime. With all of the storage in a SAN pool, accessible to any of the frontend machines, we can easily move NFS service from one machine to another.

The actual implementation uses virtual NFS servers and Solaris DiskSuite's failover support, which works quite nicely (although it is not high availability automated failover; we have to kick it off by hand). DiskSuite also lets us mirror important partitions across multiple SAN RAID controllers, so that they'll stay available even if a controller goes down.

It's not clear to me how to set up a similar relatively fast failover environment with directly attached SATA disks. I can think of two approaches; servers with a relatively small number of disks and you just have cold-spare servers waiting, or putting the disks in external shelves and giving all of your NFS servers spare SATA controllers. In either case, 'failover' would be enough work and user disruption that you would be unlikely to use it for things like applying OS patches.

(Then there is the really crazy approach where each server mirrors its local disks over the network to disks on another server via some disk-over-network protocol, whether NBD or iSCSI or the like. The downside is that you need twice as much disk space and either twice as many servers or servers with twice as many drive bays, and in the later case taking down a server means that you lose redundancy on two disk pools, not just one.)

(Credit where credit is due: the crazy approach was suggested by someone at Unix Unanimous.)

Written on 18 February 2007.
« Weekly spam summary on February 17th, 2007
Another aphorism of system administration »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 18 23:01:22 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.