Why ZFS log devices aren't likely to help us
Back in commentary on my entry on ZFS features that could entice us to upgrade Solaris versions I mentioned that we were in an unusual situation where ZFS log devices didn't seem likely to help us enough to be worth the various costs, but that explaining it properly would require an actual entry. Well, you can guess what this finally is.
The primary purpose of ZFS log devices (hereafter 'slogs') is to
accelerate synchronous writes, such as the writes that need to be done
when an application calls
sync()) or a NFS client issues
a NFS v3 COMMIT message (or, I suppose, when an NFS v2 client issues a
WRITE, if you still have any NFS v2 clients around). Without an slog,
the ZFS pool must make some synchronous writes to your actual pool
disks; with an slog, it can make some synchronous writes to what one
hopes are very much faster SSDs.
The first reason that we're not likely to see much of a win from slogs is that, well, um, er, it turns out that we're not actually doing synchronous writes. We're still writing to the actual disks, though, and under sufficient load those disks are not going to immediately tell us 'your write has been done'. Also, having slogs would allow us to switch to doing proper synchronous writes without (probably) losing too much performance.
Now we run into the other part of the problem. Every pool needs two slog devices (yes, we'd mirror them), and we have a fair number of pools. It's not feasible to give every pool two physical SSDs; this means some degree of sharing, which means some degree of shared points of failure (and shared IO choke points, since several pools will all be doing IO to the same physical SSDs). It's quite possible that we could wind up with all pools on a single fileserver depending on two physical SSDs for their slogs (in two different backends, of course).
(The third problem is that we would have to put the slog SSDs behind iSCSI. iSCSI itself adds some amount of latency, which creates a lower bound on how fast synchronous writes can go even with an infinitely fast disk system on the iSCSI target.)
For all of this we would get accelerated synchronous writes. But there's another important question: how much synchronous write activity do we actually have? Our belief so far is that most pools are read-mostly with low amounts of writes (and probably bursty writes). When we've looked at disk performance issues, there has been no clear sign pointing to write issues. So all of this effort for slog devices would likely get us not very much actual performance increase in real life usage; in fact, many of our users might not notice.
My impression is that our situation is quite unusual. Most people have only a few big pools, hosted on local disks, and they can easily identify pools that have significant write activity (often from knowing things about the usage, eg 'this pool is used for databases'). In this situation it's much easier to add an slog or two and have it give you a clear benefit.