2008-02-18
ZFS versus SANs: where do you put the RAID?
Here is an issue I have been thinking about recently: how do you want to handle ZFS in a SAN environment, especially one in with relatively low data rates such as, oh, anything based on gigabit Ethernet?
The problem is that ZFS and SANs have a small conflict. ZFS wants to talk to raw disks and do RAID internally so that it can do its special magic, but SANs want you to let the backends do as much RAID work as possible to preserve overall bandwidth. If you do mirroring in ZFS, for example, you have just doubled the write bandwidth that you need.
(Read bandwidth should be mostly unaffected, except for your periodic integrity scans. You do do periodic integrity scans, right?)
Various forms of link aggregation and multipathing can give you more iSCSI bandwidth, but my impression so far is that you have to get lucky to have both ends support the same sort of aggregation. It also means a bigger switch infrastructure and at some point you may run into limits on the total bandwidth your switches can support.
How much you have to worry about this depends on how much write bandwidth you need (and on what sort of RAID you'd have ZFS do; mirroring adds more write IO than RAID 6, which has more than RAID 5). If you fall in the middle, with not so little write IO that you can ignore this and not so much write IO that you have no choice, this seems clearly a 'pick your poison' question; both options have disadvantages.
(Here, the winning argument is likely to be that if we let ZFS do the mirroring and always mirror between two SAN backends, our machines may not reboot if they can't talk to one briefly.)
2008-02-06
Why ZFS needs a zfsdump
ZFS needs a zfsdump program for the same reasons that every filesystem
needs a *dump program: you need something that can fully and completely
reproduce the state of the on-disk data, complete with sparse files and
weird permissions and so on, and can restore small portions of the
backup not just the whole thing.
(In this day of more and more people turning off atime, it's probably no longer so important to have a backup tool that is guaranteed not to change the state of files.)
Generalized backup tools like GNU tar can do partial restores, but
cannot completely capture things like sparse files. zfs send and zfs
receive can exactly capture sparse files and so on, but do not support
partial restores.
(There are other significant drawbacks of zfs send as a backup
mechanism, including the issue with snapshots and quotas if you want to do incremental backups. My strong
impression is that the whole mechanism is only really intended for
transferring filesystems between pools and replication.)
Unfortunately it looks like almost all of the real work of zfs send
is done deep in the kernel ZFS code, so you can't reuse the user level
stuff and just make it generate dump compatible output instead of
the current stream format.