The tradeoffs for us in a SAN versus disk servers

June 29, 2014

In yesterday's retrospective I didn't say much about whether our overall architecture was the right approach. I think it was for the time, but before discussing that I want to cover what I see as the broad tradeoffs between having fileservers using a SAN and having fileservers host the disks themselves.

Given that we don't do failover in practice and no fileserver uses disks from more than two backends, one possible design is to just get rid of the SAN by putting the fileservers in, say, SuperMicro's 24+2 cases and having them hold all of their data disks locally. This would make the fileservers cost slightly more but would save us the cost of the backends plus the SAN networking (now that we're doing 10G-T, the latter is not tiny). We'd get rid of a single point of failure or performance problems for the entire collection of fileservers and we'd probably get somewhat better performance from having the disks locally instead of having to go over iSCSI.

What would we lose by not having a SAN, at least for things that we care about in somewhat more than abstract theory?

  • Fixing fileserver hardware failures would take somewhat more work. Today we just have to swap the fileserver's system disks into a new chassis; in this model we'd have to swap the data disks too, all 24 of them.

  • A fileserver that suffered a hard failure couldn't be worked around remotely. Today in an emergency we could remotely force the fileserver to power down and then (slowly) bring up its ZFS pools, virtual fileserver IP, and so on on another fileserver, but this relies on shared (SAN) storage.

  • We'd have a hard limit on how many disks a given fileserver could use. Today our limit is purely through choice and if we had a strong need we could make a fileserver expand to use disks from an additional pair of backends.

  • We couldn't easily deal with something like a disk controller failure any more. Today we can just shift away from the affected backend without having to take the fileserver down. Without a SAN this would be a fileserver downtime for hardware work on it.

    (I'm considering things like power supply failure to be a wash. This may be wrong; if a dual power supply driving 24 disks is more likely to fail than one driving 12 or 16 disks and one driving just a basic chassis is least likely to fail, then a 24-disk fileserver is more likely to die completely than the SAN version.)

Finally, the big one:

  • We'd have to rely on the fileserver's OS not just for ZFS and networking (iSCSI and NFS) but also for access to the actual disks. With a split between backends and fileservers, the backends are the only things that need to support whatever disk controllers and so on we want to use.
  • Using special purpose storage hardware (such as the backends for a fileserver for all-SSD pools) requires testing and qualifying the fileserver's OS on it.

The reality of life is that Linux has pretty much the best hardware support and other OSes, less well. In an iSCSI SAN environment like ours the fileservers don't need much hardware support; they only really need networking plus a couple of system disks (and we don't care if the system disks are a bit slow). Only the backends need to support whatever hardware we need to use to get a lot of disks into a single system and to do it well and reliably in the face of wacky various issues.

(In theory a SAN is also more flexible, expandable, and upgradeable than a disk server solution. In practice we've basically never taken advantage of that, so I'm not including it. I'm focusing on our SAN as we've actually used it, not as we could have.)

When we began planning our ZFS fileservers we strongly wanted to use Solaris (and ZFS) on the fileservers and we pretty much had to use ESATA for the disks for cost reasons. ESATA support in Linux was new and novel (we had to build our own recent kernels) and I believe ESATA support in Solaris for hardware we could afford was basically not there. The actual backend servers were also relatively cheap; the cost for a backend was mostly in the external disk enclosure, which would have been needed even with the disks directly attached to a fileserver. Even if we hadn't had other reasons at the time that made us focus on a SAN design, I think we'd have wound up with our current fileserver environment for the hardware support issue alone.

The situation is somewhat more even-handed today; my impression is that OmniOS has good support for the LSI SAS controllers that we're using for disk access. However I still trust Linux's support more, partly because I suspect that it's more widely used and tested. And going with a SAN today also keeps open the possibility of, say, figuring out how to do good, usably fast failover at some point in the future. OmniOS could always speed up ZFS pool import, for example.

(With our specific new hardware design we'd also wind up wanting fileserver cases with more than 24 data disks, since our new backend hardware has 16 disk bays plus we're planning to probably put some L2ARC SSDs on the fileservers. 24-bay cases are relatively easy to get and to drive; 36-bay cases for 3.5" drives are I believe less so.)

Comments on this page:

(I’m focusing on our SAN as we’ve actually used it, not as we could have.)

What about options you didn’t end up taking advantage of, but whose presence did change the way you approached some other situation that did crop up? Were there things like that?

By cks at 2014-06-30 00:06:54:

I can't think of anything where I can definitely say that an unused option influenced our approach, but I suspect that there were a number of areas where it was in the back of our minds. Some guesses, based on our behavior:

  • We never stressed strongly about keeping fileserver disk usage balanced or running out of disks on a particular fileserver, because we could always expand if we had to. This was helped out by people not aggressively buying storage and by people being willing to split the storage they did buy between two pools on two different fileservers.

  • That it would take so many pieces of hardware to die at once to take a fileserver down hard or destroy data probably influenced how much we cared about hardware failures and thus our willingness to tolerate lower-end hardware without features like dual power supplies and enclosure temperature monitoring and so on. Right now even a total enclosure failure that destroyed all disks in the enclosure wouldn't cause us to lose any pools (only half the mirrors would be lost); in an all-in-one version it could.

    (One plausible total enclosure failure is a total failure of cooling, causing all drives to cook themselves. If you aren't monitoring fan speed or drive temperature this can happen progressively as one fan after another fails over time.)

Written on 29 June 2014.
« A retrospective on our overall fileserver architecture et al
My .screenrc »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jun 29 00:30:28 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.