Why ZFS's data integrity is less important than Solaris's usability

May 28, 2007

Mark Musante:

The bottom line is that Solaris is hard to administer (yeah, it's a fair cop), so server data is just going to have to suffer. Hopefully some day Solaris will be as easy as redhat, or debian, or ubuntu, or <insert name of distro here>. Some day. Meanwhile, I'll choose data integrity over ease of administration.

The problem with this is that quiet disk corruption is not currently a big issue for most people; it just don't happen all that often, at least that people notice, or people would be howling in pain right now. You can argue that people just haven't noticed the corruption that they're experiencing now, but the counter-argument is that if people haven't noticed it it's clearly not that important to them (yet, more or less).

Or to put it another way: the problem for Sun is that they are trying to sell a better mousetrap when people don't feel that they have a mouse problem (or at least not a mouse problem that their existing mousetraps can't deal with).

(Perhaps Sun has done studies that show that disk systems and so on are going wrong much more often than people expect, or that future disk systems will inevitably have higher error rates, or the like. That would be newsworthy and I would expect to find that sort of stuff mentioned at the ZFS pages.)

Even without a mouse problem, people would still go for the better mousetrap if it was otherwise a more or less neutral choice, but it is not. To extend the metaphor, the better Sun mousetrap is uncomfortable and has sharp bits that poke you reasonably frequently. That it is cool and nifty starts to fade after the first few times you have to apply bandaids.

And that is why ZFS's data integrity features are less important than Solaris's ease of administration. In practice, ease of administrations matters more to more people, because right now relatively few people are seriously worried about silent data corruption whereas everyone has to administer their machines.

(In other words, people will indeed often choose practical ease of administration over (theoretical) data integrity, whether or not they are willing to admit it out loud.)


Comments on this page:

From 192.18.1.36 at 2007-08-06 03:23:55:

Sorry but I have to disagree, it is important, looks like you've been using Solaris a lot, not just today but in the past, presumably you used UFS prior ZFS. Ever seen UFS panicing with:

"freeing free block/inode/frag" ?

ever seen warning telling you:

"unexpected allocated/free inode, run fsck(1M)" ? "bad directory inode XXX"

now there is your quiet disk corruption that usually went unnoticed before ZFS, UFS often got the blame, when in reality the corruption happened due bad HW or bad FW. Almost all of the corruption with UFS I've seen over the last 6 years could be blamed to HW/FW silently corrupting data and UFS blissfully ignorant view of the world with it's assumption about all HW is good.

--- frankB http://www.opensolaris.org//viewProfile.jspa?id=66

By cks at 2007-08-06 13:11:40:

I've never seen those warnings or panics, or equivalent ones from non-Solaris systems. I don't believe that they're common in general, partly because there have not been lots of people complaining about disk corruption issues in general.

From 68.107.145.47 at 2008-04-05 11:41:24:

I've run a lot of filesystems, and I use md5 and sha1 hashes to verify "silent corruption".

No matter what Sun says, it simply does not happen in most cases, even when one of my drives starts to fail.

The drive catches most errors, and the OS catches a lot, and in the end md5 and sha1 are better than anything in ZFS because they are independently verified common programs.

After ZFS has been in use for 10 years or more like most encryption signature programs, maybe we can trust it as much.

Besides, even if disk corruption really is a big issue, the most likely solution the world would take would be to add the feature to Linux and the BSD systems.

In fact, a lot of people would do that before enduring the pain of using Solaris.

NOTE: I'm a Solaris user, but most of my systems run NetBSD, precisely because it is light years ahead of Solaris in terms of userland, software support, and administration.

I use Solaris because I like zones and I need the training.

Written on 28 May 2007.
« Paying for security exploits
On storing source code in some non-text format »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon May 28 21:56:04 2007
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.