Why I am harsh on Solaris Live Upgrade and similar tools

November 28, 2010

In the previous entry I noted that one reason I was basically disinterested in Solaris Live Upgrade is that it had hung up when I tested it several years ago (and quite a few patch levels back). This may strike people as a rather harsh reaction to a bug, to which I am going to say: absolutely, but it's the same reaction I have to bugs in any similar tool, regardless of who it's from or what it runs on.

In order to make Solaris Live Upgrade worth using instead of dangerous, it needs to do a great many things right, things that are both complex and down at the heart of the system. LU must not modify my live boot environment (only the selected alternate), it must reliably boot the boot environment I want it to, it must correctly handle falling back to another boot environment, booting an alternate environment must leave my main one completely untouched, and so on. And it must get these things right all the time and even in obscure cases, because something we're doing may turn out to be one of those obscure cases; a tool like LU cannot afford to be a 90% tool or even a 95% tool. If LU screws up any of this, I have serious problems; at the worst, I have data loss and major system downtime. Pretty much if LU gets anything wrong, I am better off not using it at all.

There is only so much of this that I can explicitly test, which means that I have to actively trust LU and the people who wrote it to get all of these things right. What happens next is simple: bugs destroy my trust. A bug is a place where LU and its programmers have not gotten it right. Sure, I might be able to work around the bug and get LU going anyways, but if there is a bug in something that I have tested, how can I have any confidence that there aren't other bugs in things that I either haven't tested yet or can't even test at all?

I can't. And without trust in the system, I can't use it at all, not unless I desperately need it and I'm willing to take a significant risk because I have no feasible alternative.

So yes, absolutely I am harsh. For good reason.

(Solaris Live Upgrade isn't the only thing that I have tried, hit a bug in, and abandoned. For example, I would like to be able to trust Linux LVM's pvmove, but I had it lock up on me once close to half a decade ago and I haven't touched it since. Maybe it's better now; I don't care. It's not worth the risk of actual data loss.)


Comments on this page:

From 68.82.55.139 at 2010-11-28 21:21:05:

I loved LU when it worked, because it was magic. And then zones on ZFS became hosed. And then it had several regressions where you couldn't delete ZFS datasets (that LU shouldn't care about anyway) without it getting confused. And then and then and then.

Basically the LU team got axed or repurposed once it became clear IPS was the way forward.

For what it's worth IPS has to do a lot less hackish madness.

--bdha

Written on 28 November 2010.
« Why I'm not really interested in Solaris's Live Upgrade stuff
Why 10G Ethernet is not a near-term issue for us »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 28 02:04:08 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.