Wandering Thoughts archives

2014-06-27

A retrospective on our Solaris ZFS-based NFS fileservers (part 2)

In yesterday's entry I talked about the parts of our Solaris ZFS fileserver environment that worked nicely over the six years we've run them. Today is for the other side, the things about Solaris that didn't go so well. You may have noticed that yesterday I was careful to talk specifically about the basics of ZFS working well. That is because pretty much all of the extra frills we tried failed or outright blew up in our faces.

The largest single thing that didn't work out anywhere near as we planned and wanted is failover. There are contributing factors beyond ZFS (see this for a full overview) but what basically killed even careful manual failover is the problem of very slow zpool imports. The saving grace of the situation is that we've only really needed failover a relatively small number of times because the fileservers have been generally quite reliable. The downside of losing failover is that the other name for failover is 'easy and rapid migration of NFS service' and there have been any number of situations where we could have used that. For example, we recently rebooted all of the fileservers because they'd been up over 650 days and we had some signs they might have latent problems. With fast, good 'failover' we could have done this effectively live without much user-visible impact (shift all NFS fileservice away from a particular machine, reboot it, shift its NFS fileservice back, repeat). Without that failover? A formal downtime.

The largest advertised ZFS feature that just didn't work was ZFS's support for spare devices. We wound up feeling that this was completely useless and built our own spares system (part 2, part 3). We also had problems with, for example, zpool status hanging in problem situations or just not being honest with us about the truth of the situation.

It turned out to be a significant issue in practice that ZFS has no API, ie no way for outside systems to reliably extract state information from it (a situation that continues to this day). Because we needed this information we were forced to develop ad-hoc and non-portable tools to extract by force from Solaris and this in turn caused further problems. One significant reason we never upgraded past Solaris 10 update 8, despite the existence of fixes we were interested in, was that upgrading would have required updating and re-validating all of these tools.

(These tools are also a large part of why we wouldn't take Solaris 11 even if Oracle offered it to us for free. We need these tools and these tools require source code access so we can reverse engineer this information.)

Overall our Solaris experiences has left me feeling that we were quite far from the (ZFS) usage cases that the Solaris developers expected. A lot of things didn't seem prepared to cope with, for example, how many 'disks' we have. Nothing actually broke significantly (at least once we stopped applying Solaris patches) but the entire environment felt fragile, like a too-tall building swaying as the wind builds up. We also became increasingly dubious about the quality of implementation of the changes that Sun (and then Oracle) was making to Solaris, adding another reason to stop applying patches and to never upgrade past Solaris 10U8.

(Allow me to translate that: Solaris OS developers routinely wrote and released patches and changes with terrible code that broke things for us and didn't work as officially documented. The Sun and Oracle reaction to this was a giant silent shrug.)

While we got away with our 'no patches, no updates, no changes' policy I'm aware that we were lucky; we simply never hit any of the known S10U8 bugs. I didn't (and don't) like running systems that I feel I can't update because things are sure to break and we definitely wound up doing that with our Solaris machines. I count that as something that did not go well.

In general, over time I've become increasingly uncomfortable about our default 'no updates on black box appliance style machines' policy, which we've followed on both the Solaris fileservers and the iSCSI backends. I kind of count it as an implicit failure in our current fileserver environment. For the next generation of fileservers and backends I'd really like to figure out a way to apply as many updates as possible in a safe way (I have some ideas but I'll save them for another entry).

None of these things that didn't work so well have been fatal or even painful in day to day usage. Some of them, such as the ZFS spares situation, have forced us to do things that improved the overall environment; having our own spares system has turned out to be a big win because it can be more intelligent and more aggressive than any general ZFS solution could be.

ZFSFileserverRetrospective02 written at 02:07:41; Add Comment

2014-06-25

A retrospective on our Solaris ZFS-based NFS fileservers (part 1)

We're in the slow process of replacing our original Solaris ZFS fileserver environment with a second generation environment. With our current fileservers enter their sunset period it's a good time to take an uncommon retrospective look back over their six years of operation and talk about what went well and what didn't quite do so. Today I'm going to lead with the good stuff about our Solaris machines.

(I'm actually a bit surprised that it's been six years, but that's what the dates say. I wrote the fileservers up in October of 2008 and they'd already been in operation for several months at that point.)

The headline result is that our fileserver environment has worked great overall. We've had six years of service with very little disruption and no data loss. We've had many disks die, we've had entire iSCSI backends fail, and through it all ZFS and everything else has kept trucking along. This is actually well above my expectations six years ago, when I had a very low view of ZFS's long-term reliability and expected to someday lose a pool to ZFS corruption over the lifetime of our fileservers.

The basics of ZFS have been great and using ZFS has been a significant advantage for us. From my perspective, the two big wins with ZFS have been flexible space management for actual filesystems and ZFS checksums and scrubs, which have saved us in ways large and small. Flexible space management has sometimes been hard to explain to people in a way that they really get, but it's been very nice to simply be able to make filesystems for logical reasons and not have to ask people to pre-plan how much space they get; they can use as little or more or less as much as they need.

Solaris in general and Solaris NFS in particular has been solid in normal use and we haven't seen any performance issues. We used to have some mysterious NFS mount permission issues (where a filesystem wouldn't mount or work on some systems) but they haven't cropped up on our systems for a few years from what I remember. Our Solaris 10 update 8 installs may not be the most featureful or up to date systems but in general they've given us no problems; they just sit in their racks and run and run and run (much like the iSCSI backends). I think it says good things that they reached over 650 days of uptime recently before we decided to reboot them as a sort of precaution after one crashed mysteriously.

Okay, I'll admit it: Solaris has not been completely and utterly rock solid for us. We've had one fileserver that just doesn't seem to like life, for reasons that we're not sure about; it is far more sensitive to disk errors and it's locked up several times over the years. Since we've replaced the hardware and reinstalled the software, my vague theory is that it's something to do with either or both of the NFS load it gets or the disks it's dealing with (it has most of our flaky 1TB Seagate disks, which fail at rates far higher than the other drives).

One Solaris feature deserves special mention. DTrace (and with it Solaris source code) turned out to be a serious advantage and very close to essential for solving an important performance problem we had. We might have eventually found our issue without DTrace but I'm pretty sure DTrace made it faster, and DTrace has also given us useful monitoring tools in general. I've come around to considering DTrace an important feature and I'm glad I get to keep it in our second generation environment (which will be using OmniOS on the fileservers).

I guess the overall summary is that for six years, our Solaris ZFS-based NFS fileservers have been boring almost all of the time; they work and they don't cause problems, even when crazy things happen. This has been especially true for the last several years, ie after we shook out the initial problems and got used to what to do and not to do.

(We probably could have made our lives more exciting for a while by upgrading past Solaris 10 update 8 but we never saw any reason to do that. After all, the machines worked fine with S10U8.)

That isn't to say that Solaris has been completely without problems and that everything has worked out for us as we planned. But that's for another entry (this one is already long enough).

Update: in the initial version of this entry I completely forgot to mention that the Solaris iSCSI initiator (the client) has been problem free for us (and it's obviously a vital part of the fileserver environment). There are weird corner cases but those happen anywhere and everywhere.

ZFSFileserverRetrospective01 written at 22:17:29; Add Comment

By day for June 2014: 25 27; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.