Why ZFS log devices aren't likely to help us

March 12, 2012

Back in commentary on my entry on ZFS features that could entice us to upgrade Solaris versions I mentioned that we were in an unusual situation where ZFS log devices didn't seem likely to help us enough to be worth the various costs, but that explaining it properly would require an actual entry. Well, you can guess what this finally is.

The primary purpose of ZFS log devices (hereafter 'slogs') is to accelerate synchronous writes, such as the writes that need to be done when an application calls fsync() (or sync()) or a NFS client issues a NFS v3 COMMIT message (or, I suppose, when an NFS v2 client issues a WRITE, if you still have any NFS v2 clients around). Without an slog, the ZFS pool must make some synchronous writes to your actual pool disks; with an slog, it can make some synchronous writes to what one hopes are very much faster SSDs.

The first reason that we're not likely to see much of a win from slogs is that, well, um, er, it turns out that we're not actually doing synchronous writes. We're still writing to the actual disks, though, and under sufficient load those disks are not going to immediately tell us 'your write has been done'. Also, having slogs would allow us to switch to doing proper synchronous writes without (probably) losing too much performance.

Now we run into the other part of the problem. Every pool needs two slog devices (yes, we'd mirror them), and we have a fair number of pools. It's not feasible to give every pool two physical SSDs; this means some degree of sharing, which means some degree of shared points of failure (and shared IO choke points, since several pools will all be doing IO to the same physical SSDs). It's quite possible that we could wind up with all pools on a single fileserver depending on two physical SSDs for their slogs (in two different backends, of course).

(The third problem is that we would have to put the slog SSDs behind iSCSI. iSCSI itself adds some amount of latency, which creates a lower bound on how fast synchronous writes can go even with an infinitely fast disk system on the iSCSI target.)

For all of this we would get accelerated synchronous writes. But there's another important question: how much synchronous write activity do we actually have? Our belief so far is that most pools are read-mostly with low amounts of writes (and probably bursty writes). When we've looked at disk performance issues, there has been no clear sign pointing to write issues. So all of this effort for slog devices would likely get us not very much actual performance increase in real life usage; in fact, many of our users might not notice.

My impression is that our situation is quite unusual. Most people have only a few big pools, hosted on local disks, and they can easily identify pools that have significant write activity (often from knowing things about the usage, eg 'this pool is used for databases'). In this situation it's much easier to add an slog or two and have it give you a clear benefit.

Comments on this page:

From at 2012-03-12 09:43:06:

"Our belief so far is that most pools are read-mostly with low amounts of writes (and probably bursty writes)."

fsstat -F may give you some general idea.

Written on 12 March 2012.
« A CBL false positive reveals a significant issue with the CBL
Why it matters whether your software works when virtualized »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 12 00:37:24 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.