2007-02-19
What I currently know about Fibrechannel versus iSCSI versus AoE
We're planning a significant capacity increase for our local SAN storage pool, which means that we've been trying to figure out which SAN technology we want to go with. We don't have high performance IO needs, so we're going for bulk storage: large SATA disks in RAID 5 in some sort of SAN RAID controller. We plan to use Solaris 10 on x86 for our NFS servers that will use the SAN, with DiskSuite for failover. DiskSuite has to own full disks if it's going to do failover, not just partitions, so we need our SAN RAID controllers to export logical LUNs.
(The local opinion is that we trust Solaris more than Linux as a NFS server, plus it has DiskSuite.)
There are three possible choices: Fibrechannel, iSCSI, and ATA-over-Ethernet. Of these:
- iSCSI and AoE can be had for about $5k for a 15-bay 3U RAID controller,
in the form of the Promise VTrak M500i
or the Coraid SR1521.
You buy commodity SATA disks yourself from your cheap source of
choice.
(There are probably other vendors for 15-bay 3U iSCSI controllers; we haven't looked very hard.) - FC costs about $5.5k and up for a 12-bay 2U RAID controller. You have to buy the disks through the controller vendor, at a not insignificant markup.
- as a result, FC costs about twice as much per terabyte as iSCSI or AoE.
- people on campus have positive experience with all three options,
and none of them have blown up yet.
- AoE and iSCSI cost much less than Fibrechannel to add stuff to later, because Fibrechannel switches are really expensive and thus a) you tend not to have many spare switch ports and b) getting more switch ports is expensive.
- similarly, it is much cheaper to have redundancy and spares for your AoE or iSCSI switching fabric.
- for at least AoE, you want switches and machines that can do
jumbo frames. This probably won't hurt for iSCSI either.
- there is only one vendor of AoE RAID controllers, Coraid. Coraid's stuff currently does not do logical LUNs within a single disk array.
- while Promise's stuff does do logical LUNs, it has some limits on how many you get within a single disk array. Fortunately we seem unlikely to run into them.
- Coraid's management and monitoring software seems to be less advanced
than Promise's, which will do things like mail you problem reports.
- AoE has a much simpler specification than iSCSI, but this is
somewhat misleading because the AoE spec doesn't say what ATA
commands you must support in order to talk to common AoE
implementations, and thus doesn't include a spec for them; in
practice the AoE spec has to be considered to include some of the
ATA spec itself.
- Linux has both AoE and iSCSI drivers in the standard kernel.
- Solaris 10 has standard iSCSI drivers, but no standard AoE ones. Coraid is sponsoring the development of an open source AoE driver, but it's currently only tested on SPARC systems, not x86 systems, and may not yet fully support ZFS (apparently ZFS needs the disk driver to support some new operations). However, it supports Solaris 7, 8, and 9 in addition to Solaris 10.
- the Linux AoE driver is about 2,000 lines; the iSCSI driver is about
4,0005,800 lines between the actual driver and the iSCSI library. The AoE driver is a straight block driver, the iSCSI driver is a SCSI driver. - there is a non-integrated AoE target driver for Linux called vblade. No one else has AoE target drivers. (Target drivers allow you to use a machine as a SAN RAID controller that exports storage to other machines.)
- there is an integrated iSCSI target driver for Solaris 10, although it is not yet in official releases.
- there are a number of Linux iSCSI target drivers; none are
integrated (yet).
- in general, iSCSI is more mature and widely supported than AoE.
- Linux seems to have better support for AoE than for iSCSI, which is probably because AoE is simpler and has less peculiar bits. (There is a certain enterprisey smell about iSCSI.)
Since we are not interested in building our own SAN RAID controllers, we are almost certainly going to wind up with iSCSI; AoE is unsuitable on several grounds, and Fibrechannel costs too much for what we get. If we were building our own SAN RAID controllers out of PCs running in target mode, I would be very tempted by AoE because of the simplicity of all of the bits involved (and building our own would overcome several of the things that make AoE unsuitable).
Another aphorism of system administration
Noticing when something shows up is easy; detecting when it goes away is hard.
Like all aphorisms, this has exceptions. And if you want to see it that way, it's a corollary of an earlier aphorism.
(An aphorism brought to mind as I contemplate our DHCP configuration files and wonder just how many of those Ethernet addresses are currently mouldering in a dump somewhere.)