The tradeoffs for us in a SAN versus disk servers
In yesterday's retrospective I didn't say much about whether our overall architecture was the right approach. I think it was for the time, but before discussing that I want to cover what I see as the broad tradeoffs between having fileservers using a SAN and having fileservers host the disks themselves.
Given that we don't do failover in practice and no fileserver uses disks from more than two backends, one possible design is to just get rid of the SAN by putting the fileservers in, say, SuperMicro's 24+2 cases and having them hold all of their data disks locally. This would make the fileservers cost slightly more but would save us the cost of the backends plus the SAN networking (now that we're doing 10G-T, the latter is not tiny). We'd get rid of a single point of failure or performance problems for the entire collection of fileservers and we'd probably get somewhat better performance from having the disks locally instead of having to go over iSCSI.
What would we lose by not having a SAN, at least for things that we care about in somewhat more than abstract theory?
- Fixing fileserver hardware failures would take somewhat more work.
Today we just have to swap the fileserver's system disks into a new
chassis; in this model we'd have to swap the data disks too, all 24
- A fileserver that suffered a hard failure couldn't be worked around
remotely. Today in an emergency we could remotely force the
fileserver to power down and then (slowly) bring up its ZFS pools,
virtual fileserver IP, and so on on another fileserver, but this
relies on shared (SAN) storage.
- We'd have a hard limit on how many disks a given fileserver could
use. Today our limit is purely through choice and if we had a
strong need we could make a fileserver expand to use disks from
an additional pair of backends.
- We couldn't easily deal with something like a disk controller
failure any more. Today we can just shift away from the affected
backend without having to take the fileserver down. Without a
SAN this would be a fileserver downtime for hardware work on it.
(I'm considering things like power supply failure to be a wash. This may be wrong; if a dual power supply driving 24 disks is more likely to fail than one driving 12 or 16 disks and one driving just a basic chassis is least likely to fail, then a 24-disk fileserver is more likely to die completely than the SAN version.)
Finally, the big one:
- We'd have to rely on the fileserver's OS not just for ZFS and networking (iSCSI and NFS) but also for access to the actual disks. With a split between backends and fileservers, the backends are the only things that need to support whatever disk controllers and so on we want to use.
- Using special purpose storage hardware (such as the backends for a fileserver for all-SSD pools) requires testing and qualifying the fileserver's OS on it.
The reality of life is that Linux has pretty much the best hardware support and other OSes, less well. In an iSCSI SAN environment like ours the fileservers don't need much hardware support; they only really need networking plus a couple of system disks (and we don't care if the system disks are a bit slow). Only the backends need to support whatever hardware we need to use to get a lot of disks into a single system and to do it well and reliably in the face of wacky various issues.
(In theory a SAN is also more flexible, expandable, and upgradeable than a disk server solution. In practice we've basically never taken advantage of that, so I'm not including it. I'm focusing on our SAN as we've actually used it, not as we could have.)
When we began planning our ZFS fileservers we strongly wanted to use Solaris (and ZFS) on the fileservers and we pretty much had to use ESATA for the disks for cost reasons. ESATA support in Linux was new and novel (we had to build our own recent kernels) and I believe ESATA support in Solaris for hardware we could afford was basically not there. The actual backend servers were also relatively cheap; the cost for a backend was mostly in the external disk enclosure, which would have been needed even with the disks directly attached to a fileserver. Even if we hadn't had other reasons at the time that made us focus on a SAN design, I think we'd have wound up with our current fileserver environment for the hardware support issue alone.
The situation is somewhat more even-handed today; my impression is that OmniOS has good support for the LSI SAS controllers that we're using for disk access. However I still trust Linux's support more, partly because I suspect that it's more widely used and tested. And going with a SAN today also keeps open the possibility of, say, figuring out how to do good, usably fast failover at some point in the future. OmniOS could always speed up ZFS pool import, for example.
(With our specific new hardware design we'd also wind up wanting fileserver cases with more than 24 data disks, since our new backend hardware has 16 disk bays plus we're planning to probably put some L2ARC SSDs on the fileservers. 24-bay cases are relatively easy to get and to drive; 36-bay cases for 3.5" drives are I believe less so.)