How we set up our Solaris ZFS-based NFS fileservers
A SAN backend is fairly useless without a front end, so we have some of those too. First, a brief bit of background: our overall goal is to provide flexible NFS fileservice. The actual SAN front end servers do nothing except NFS service; all user work happens on other machines.
That said, the NFS fileservers are Solaris 10 x86 servers, specifically
Solaris 10 Update 5. Hardware wise, they are SunFire X2200s with 8 GB
of RAM, although we may have to increase the amount of RAM later. They
all have two system disks and use mirrored system partitions (through
Solaris Volume Manager; even if S10U5 supported using ZFS for the
root filesystem, I wouldn't trust it). They are mostly stock Solaris
installs; we use Blastwave to get useful software
like Postfix and
tcpdump, and pca to manage
patches (to the extent that we patch at all). Time synchronization
is done by the Solaris NTP daemon, talking to our local set of NTP
All data space is accessed through iSCSI from the iSCSI backends and managed through ZFS. Since the backends are exporting more or less raw disk space, we use ZFS to create our RAID redundancy by mirroring all storage chunks between two separate iSCSI target servers. This wastes half the disk space and may cost a certain amount of write performance, but it goes very fast on read IO (especially random read IO) and gives us significant redundancy, including against iSCSI target failures.
(Note that this setup makes it vital that all iSCSI LUNs are exactly the same size, which is one reason we carve the physical disks up into multiple LUNs.)
Rather than randomly pair up LUNs on targets whenever we need a new mirrored ZFS vdev, we have a convention where two targets are always paired together (thus, each disk and LUN on target A will always be paired with the same disk and LUN on target B, and likewise for targets C and D). We use local software to avoid horrible accidents when managing ZFS pools, since we always want to create and grow ZFS pools with mirrored pairs instead of single disks.
Although we are not doing failover, we have engineered the environment to support it in the future. All iSCSI storage is visible to all fileserver machines (we're not doing any sort of storage fencing), and we are using virtual names and IP aliases for the logical fileservers, so that we could move a logical fileserver to a different physical one if we needed to. We have adopted a ZFS pool naming convention that puts the logical fileserver name in the pool's name, so that in an emergency we can easily see which pools a given (logical) fileserver is supposed to have.
(We have actually tested by-hand failover without problems. It's not
that difficult, just slow due to
zpool import issues and potentially
Because ZFS keeps track of things like filesystems, NFS exports, and so on in the ZFS pools themselves, the fileservers are effectively generic; there is no per-fileserver configuration necessary beyond their host names and IP addresses and associated data like ssh keys.
While the fileservers have all of our user accounts so that we can see file ownership properly and so on, users are not allowed to log in to them. System staff are, but we have remapped our home directories so that we have local, non-ZFS home directories on each (physical) fileserver instead of our normal real home directories. Fileservers do not NFS mount anything, because it's not necessary if users can't log in and it keeps them more independent.
(As you might guess, this implies that we do email delivery over NFS; so far this has not caused problems. The fileservers themselves run our standard 'null client' Postfix mail configuration that just forwards all locally generated email to our central mail submission point, so that they can send us administrative email and so on.)