The problem with ZFS, SANs, and failover

June 12, 2008

The fundamental problem with Solaris 10's current version of ZFS in a SAN failover environment is that it has no concept of locking or host ownership of ZFS pools; instead, ZFS pools are either active (imported) or inactive (exported). So, if a host crashes and you want to fail over its ZFS pools, the pools are still marked as active, which means you must force pool import, which has catastrophic consequences if something ever goes wrong.

But it gets worse. Because hosts don't normally deactivate their pools when they shut down, a booting host will grab all of the pools that it thinks it should have regardless of their active versus inactive status and thus regardless of whether they are being used by another machine, because it cannot tell.

You can set ZFS pools so that they aren't imported automatically on boot (by using 'zpool import -R / ...'). However, in our iSCSI SAN environment each zpool import takes approximately a third of a second per LUN; a third of a second per LUN times a bunch of LUNs times a bunch of pools is an infeasibly long amount of time.

(And before you ask, as far as I can tell there is no opportunity to do something clever before Solaris auto-imports all of the pools during boot because the auto-importation happens by special magic.)

The conclusion is that if a host crashes and you want to fail over its pools, you must make utterly sure that it will never spontaneously reappear. (I recommend going down to the machine room and pulling its power and network cables and removing its disks. Make sure you zero them before you return them to any spares pool you have.)

If you try hard enough there are ways around some of this, such as storage fencing, where you arrange with your backend so that each host in the SAN can only see the storage with the pools that it should be importing. But this is going to complicate your SAN and your failover, and again if anything ever goes wrong you will have catastrophic explosions.

(Much of this is fixed in things currently scheduled for Solaris 10 update 6. Unfortunately we need to start deploying our new fileserver environment before then.)

Written on 12 June 2008.
« Designing a usable DNS Blocklist result format
The cost of virtualization »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Jun 12 23:41:54 2008
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.