2012-03-31
Why I no longer believe that you need Solaris if you want ZFS
Four years ago I wrote an entry on why you wanted to use Solaris if you were going to use ZFS. Recently I have been reconsidering this issue, and I no longer believe that you need to pick Solaris if you're going to use ZFS. What has happened is that ZFS and ZFS development has changed drastically.
Back in 2008 it was clear that there was only one ZFS. All of the real ZFS development was happening at Sun and was being done to Solaris; all other versions were copying this work with various delays. Today in 2012 there's effectively not one ZFS any more, but instead at least two and maybe three (or more): Illumos ZFS, Solaris ZFS, and perhaps FreeBSD ZFS. (I don't know how separate FreeBSD ZFS is from Illumos ZFS.)
Illumos ZFS has real developer firepower behind it (many of the original
ZFS developers have left Sun Oracle and moved to companies
that contribute to Illumos), while at the same time Oracle has made
changes that make Solaris 11 far less desirable (eg much higher costs
and closed source). It also seems likely that
neither version of ZFS will get really compelling changes (like the
ability to remove vdevs from a pool). This makes the two versions of
ZFS much more balanced and competitive, and the lack of major changes
makes a (potentially) older ZFS like FreeBSD's not that unattractive.
(As for support and bug fixes, let's just say that I expect even less from Oracle than from Sun.)
Another, less complementary way of putting it is that with ZFS today what you see now is pretty much what you're going to get in the future. Major changes might happen but they don't seem to be the way to bet. With ZFS basically frozen it's much easier to look at something like FreeBSD, evaluate its ZFS, and say 'this is good enough for us'; you're unlikely to be missing anything important in the future no matter what happens (or doesn't happen) with FreeBSD ZFS development.
To condense a potentially long discussion, all of this leaves me feeling that FreeBSD is now a generally viable mainline ZFS platform. It doesn't have the absolutely latest ZFS and bugfixes (whether you consider these to be the Illumos ones or the Solaris ones), but it has other advantages and its ZFS is likely to be good enough for most things.
(If you really need the features of Oracle Solaris's ZFS, even despite the uncertainties, well, you don't have a choice right now and maybe not ever. But I don't think many people are stuck like that, and I do mean 'stuck'.)
2012-03-12
Why ZFS log devices aren't likely to help us
Back in commentary on my entry on ZFS features that could entice us to upgrade Solaris versions I mentioned that we were in an unusual situation where ZFS log devices didn't seem likely to help us enough to be worth the various costs, but that explaining it properly would require an actual entry. Well, you can guess what this finally is.
The primary purpose of ZFS log devices (hereafter 'slogs') is to
accelerate synchronous writes, such as the writes that need to be done
when an application calls fsync() (or sync()) or a NFS client issues
a NFS v3 COMMIT message (or, I suppose, when an NFS v2 client issues a
WRITE, if you still have any NFS v2 clients around). Without an slog,
the ZFS pool must make some synchronous writes to your actual pool
disks; with an slog, it can make some synchronous writes to what one
hopes are very much faster SSDs.
The first reason that we're not likely to see much of a win from slogs is that, well, um, er, it turns out that we're not actually doing synchronous writes. We're still writing to the actual disks, though, and under sufficient load those disks are not going to immediately tell us 'your write has been done'. Also, having slogs would allow us to switch to doing proper synchronous writes without (probably) losing too much performance.
Now we run into the other part of the problem. Every pool needs two slog devices (yes, we'd mirror them), and we have a fair number of pools. It's not feasible to give every pool two physical SSDs; this means some degree of sharing, which means some degree of shared points of failure (and shared IO choke points, since several pools will all be doing IO to the same physical SSDs). It's quite possible that we could wind up with all pools on a single fileserver depending on two physical SSDs for their slogs (in two different backends, of course).
(The third problem is that we would have to put the slog SSDs behind iSCSI. iSCSI itself adds some amount of latency, which creates a lower bound on how fast synchronous writes can go even with an infinitely fast disk system on the iSCSI target.)
For all of this we would get accelerated synchronous writes. But there's another important question: how much synchronous write activity do we actually have? Our belief so far is that most pools are read-mostly with low amounts of writes (and probably bursty writes). When we've looked at disk performance issues, there has been no clear sign pointing to write issues. So all of this effort for slog devices would likely get us not very much actual performance increase in real life usage; in fact, many of our users might not notice.
My impression is that our situation is quite unusual. Most people have only a few big pools, hosted on local disks, and they can easily identify pools that have significant write activity (often from knowing things about the usage, eg 'this pool is used for databases'). In this situation it's much easier to add an slog or two and have it give you a clear benefit.