A ZFS-based fileserver design
August 1, 2007
The following is the ZFS-based design we would like to use for our new fileserver environment, presented for your entertainment and whatever use you can get from it.
The basic thing we give people are 'storage pools', which are made up from one or more standard sized 'bricks' of storage. Each storage pool contains one or more filesystems, and is owned by a group (or a single person).
(Here I am using 'filesystem' to mean 'distinct mount point' or 'different name', which is really what users see when things get NFS exported to our actual user servers.)
Mechanically, each storage pool is a ZFS storage pool and each brick is a logical drive (or a slice of a logical drive) from a backend SAN controller. Because of ZFS's long term storage management issues, the SAN backend has to handle all of the RAID stuff; ZFS's own RAID support is used only for storage migration and for highly available storage pools, which would be mirrored between several SAN controllers.
(This turns ZFS into a more featureful Solaris Volume Manager, which is kind of a pity.)
You expand your available storage by getting another brick, which can either be added to an existing storage pool or be used to start another one. If you don't have an existing storage pool, you have to start one; you can't buy a brick for an existing group storage pool but reserve it exclusively for your own use.
Groups can add new filesystems any time they want to; they just tell us which of their storage pools the new filesystem should go into. However, filesystems don't move between storage pools once they're created. Groups can also tell us to remove filesystems, although you always have to have one filesystem in each storage pool.
(Technically we can move filesystems between storage pools, but it involves manual data copies and forced NFS remounts and user visible downtimes and so on and we don't want to do it very often.)
The advantage of having a big storage pool with multiple filesystems is that a group does not have to decide ahead of time how much space they want in each different filesystem; they can let them expand (and contract) as needed. The drawback of piling everything into one storage pool is that if a group gets grant funding and buys an entire SAN backend controller to get more storage, they can only mirror or transfer entire storage pools to their new space; they can't make that decision on a filesystem by filesystem basis. (They can expand existing storage pools by adding bricks from their new controller, but then those storage pools and their filesystems depend on their new controller as well as ours.)
Storage pools can never shrink (at least until ZFS adds that feature). This is not too much of a problem, since we don't currently buy storage back from people. (If they need a bunch of space only temporarily, we can create and then later destroy an entire storage pool.)
There will be some maximum size for storage pools, probably somewhere around 2 Tb, so that a single storage pool can't eat too much of a single SAN RAID controller's disk space. There is no size limit for filesystems, except that if you want them to be backed up they can only be so big (probably around 200 Gb; it's based on how big our backup tapes are).
* * *
Atom feeds are available; see the bottom of most pages.