The easy way to do fast OS upgrades

July 21, 2010

We recently went through the experience of upgrading all of our ZFS fileservers from Solaris 10 update 6 to Solaris 10 update 8. This took somewhere around twenty minutes of downtime per fileserver, most of which was waiting for ZFS pools to slowly import.

You might wonder how we got an OS upgrade to go so fast. The answer is that we cheated, twice.

The first way we cheated is that we didn't upgrade the OS; instead, we (re)installed Solaris 10 update 8 from scratch. This is our traditional approach with most of our servers (anything that doesn't have important local data, and we try not to have servers with important local data). We need to be able to reinstall servers anyways to cope with hardware problems, and once you have a well-tested reinstall process you might as well use it for everything.

The second way we cheated is that we didn't reinstall S10U8 on the same machine. Our ZFS fileservers have swappable disks, so we did the install on a spare server (with identical hardware) then swapped the new S10U8 disks into the actual physical fileserver during the downtime. And then, of course, we had to fix up all of the places on the system that knew what host it was and what hardware it was running on, which is really why the downtimes took more than a minute or two.

(This also gave us a rapid fallback if we had to; we could have just pulled the S10U8 disks and put the S10U6 disks back in.)

Now, various OSes have various sorts of software based fast upgrade schemes, and some of them even work reliably. But you can be pretty sure that swapping disks will work for anything, provided only that you can rename a system and move its system disks between hardware units, and you're going to want to work out how to do both of those anyways.

(Sadly, these days systems are increasingly welded to the hardware that they were installed on in various perverse ways that require annoying amounts of effort to reverse or override.)

Comments on this page:

From at 2010-07-21 04:50:40:

Having pairs of machines as redundant (hot or cold-swap) boxes is always nice (if you can), so if you have them then it does indeed make it a lovely way to upgrade :D

This is made even quicker when you have VMs for servers, as you can just install a new VM and transfer the disk (or iSCSI connection or whatever). Downtime is about the time taken to reboot (well, shut down one machine and boot the other).

From at 2010-07-21 05:55:36:

or you could use LU+ZFS

From at 2010-07-21 12:19:50:

This reminds me of my rsync-based P2V method, because VMware Converter only does cold-clone of Linux servers. :)


From at 2010-07-26 15:28:22:

LU+ZFS is fine except for the LU part, which has a bad habit of failing miserably. -- bda

Written on 21 July 2010.
« The sysadmin view of messages from programs
Why keeping /etc under version control doesn't entirely help »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 21 00:22:26 2010
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.