ZFS features that could entice us to upgrade Solaris versions

October 28, 2011

I've written before about how our fileservers are basically appliances and so don't patched because we don't like taking any risks of destabilizing vital core services. Today they run what is more or less Solaris 10 Update 8 (which causes some problems), and today I've gotten interested in inventorying what features from subsequent Solaris versions might be attractive enough to us to cause us to upgrade.

(Note that there are a lot of ZFS features in S10U9 and S10U10 that will be attractive to other people but not us. I'm being selfish and just looking at what we care about.)

Solaris 10 Update 9 introduced two important changes: log device removal and pool recovery. We don't currently use log devices because we don't think that we have any pool that could really benefit from them (especially once we add iSCSI overhead on top of the log devices), but if we ever did need to add an SSD log device to a hot pool, we'd want this change.

My impression so far is that pool recovery does not require a ZFS pool version upgrade and so you can perform it by just keeping a spare S10U9 system around (or a S10U10 one). Perhaps we should build such a system, just in case. And certainly it might be a good idea to test this assumption.

Solaris 10 Update 10 adds more improvements to pool recovery (and a lot of features that we don't care about). Again, it's not clear to me if this recovery works on old pool versions or if you have to upgrade your There are of course

The more I look at this, the more I think that I need to build a current Solaris install just to have it sitting around. Fortunately we have plenty of spare hardware.

(This is one of the powers of blogging. Initially I set out to write a rather different entry for today, but when I started doing my research everything wound up shifting around and now I have a new project at work.)

Sidebar: Solaris release information resources

Since I found this once: Solaris 10 update 10 (8/11), Solaris 10 update 9 (9/10). Let's hope Oracle doesn't decide to change the URLs for these again. Note that these are not complete feature lists; they don't mention things like ZFS performance improvements.

Also, ZFS pool and filesystem versions, although that doesn't cover S10U10. It points to the ZFS file system guide, which has a what's-new feature list.

Comments on this page:

From at 2011-10-28 15:34:21:

Log device removal is huge. Definitely worth the upgrade! Without it you need more reliable log devices which are very pricey and don't really add much performance (for our workloads) beyond the lower cost SSD's (though the DDRdrive is pretty slick).

The downside is, it's getting hard to find drives targeted at the log workload. You really need only about 2GB of space (most drives are at least 100GB now it seems) and especially not ones with SLC flash (Intel has a 20GB SLC drive now, but it's actually not aimed at Enterprise workloads and have yet to try it out and compare with the X-25E workhorses we've been using).

From at 2011-10-28 15:37:24:

I'd start with Solaris 11. It adds dedupe. :)

It should be noted that any protocol which issues SYNC requests will increase latency times for ZFS, unless a log device is present. Typically NFS clients use SYNC (e.g. VMware ESX(i)), and I believe iSCSI does as well.

From at 2011-10-28 17:37:17:

Instead of (only) installing a more recent version of Solaris on separate hardware, you may also want to investigate using Live Upgrade and setting up separate boot environments (BEs) which can be patched while the system is running. Once you're in your maintenance window, you reboot the host into the new BE. If there's an issue you just reboot back into the original BE. Your downtime is your reboot time.

It's a bit of work figuring out the CLI options with UFS root—you need a separate pair of mirrored root disks, or to break your current mirror and patch one half (which can be done automagically)—but quite lovely if you have ZFS root as it leverages snapshots and clones.

For "slog" devices, you don't actually need a big device to get good performance. By default the ZFS Intent Log (ZIL) cannot be more than 50% of RAM, so if your file server is 24 GB (not a lot these days really), you don't need more than a 12 GB SSD. If you buy a 80 GB SLC, you can use 12 GB for the "slog" device and use the rest for a "cache" device.

The topic has come up a few times on the "zfs-discuss" list over the years, and the general consensus that MLC can be fine for "cache" devices, but SLC is preferred for "slog". SandForce SF-1500- and SF-2500/2600-based devices are well-regarded (they usually come with supercaps).

As for ZFS dedupe: still performance issues with it. Have a lot of RAM or a big L2ARC SSD (or both).

From at 2011-10-28 17:39:46:

One more thing: a major feature that I think a lot of people have been waiting for is the so-called "bp* rewrite" functionality.

It would allow one to remove devices and also change RAID levels (i.e. go from RAIDZ1 to Z2, etc.). This has supposedly been in the works for a while, but no word on it yet. Perhaps Solaris 11?

By cks at 2011-10-29 03:01:08:

It turns out that ZFS dedup is useless for us in practice. Why is kind of long so I put it in ZFSDedupMemoryProblem.

My experience is that live upgrades are one of those 'sounds useful but isn't really' things. A good discussion is too long for a comment, but fortunately I've already written about it.

(The other part of 'install on new hardware' is that this would be a test installation done to try things out. This isn't something that I can do on a production fileserver at all, so I have to put it on test hardware.)

Our slog situation is complicated enough to not fit within the margins of this comment. I should probably write at least a summary entry on it. The short version is that we are in an unusual situation where slogs don't seem likely to be worth the expense.

As far as I know, bp rewrite is not an announced Solaris 11 feature. It's been one of those great desires more or less since ZFS was announced; as far as I know, first Sun and now Oracle has done about exactly nothing on implementing it.

PS: 24 GB may be typical for fileservers in some environments, but it's not in ours. Ours have 8 GB and I consider that somewhat wasteful.

Written on 28 October 2011.
« A Wikitext formatting mistake that I made here
Why ZFS dedup is not something we can use »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 28 02:15:45 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.