Wandering Thoughts archives

2008-03-02

How ZFS's version of RAID-5 can be better than normal RAID-5

ZFS's 'raidz' and 'raidz2' storage methods are single and dual parity, but they are not exactly the same as normal RAID-5 and RAID-6. The difference is in how each handles partial stripe writes, such as what happens when you write only a small amount to disk.

In a conventional RAID implementation, a partial write to a stripe also has to update the parity, which means additional disk IO (at least a disk read and a disk write). Even if this disk IO doesn't delay the nominal completion of the write, it puts more activity on your disks in general, and disks only support so many IO operations a second.

By contrast ZFS effectively avoids partial stripe writes, because ZFS doesn't rewrite data in place. Even when you update an existing file, ZFS writes new data blocks for the new data, and when it writes the new data blocks it can write new parity blocks for them as well. As a corollary, ZFS doesn't have to have a fixed stripe size (or a fixed chunk size); it just has to make sure that it has enough parity blocks on each separate write.

(This does raise interesting questions of how you make sure that parity doesn't use too much of your disk space if you're doing lots of separate small writes, since such small stripes may not span all of the disks in your pool.)

ZFS can do this sort of thing because it knows what areas of the disk do and don't have data, so it can wander around doing intelligent updates. Disk-level RAID has to assume that all data blocks are live and so has to always update parity any time one of them is touched; the only saving it gets is not having to do a read-modify-write cycle if you write a full stripe.

(There are also some reliability advantages of never doing partial stripe rewrites; see Jeff Bonwick.)

solaris/ZFSRaidAdvantage written at 23:35:13; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.