The problem with ZFS, SANs, and failover
The fundamental problem with Solaris 10's current version of ZFS in a SAN failover environment is that it has no concept of locking or host ownership of ZFS pools; instead, ZFS pools are either active (imported) or inactive (exported). So, if a host crashes and you want to fail over its ZFS pools, the pools are still marked as active, which means you must force pool import, which has catastrophic consequences if something ever goes wrong.
But it gets worse. Because hosts don't normally deactivate their pools when they shut down, a booting host will grab all of the pools that it thinks it should have regardless of their active versus inactive status and thus regardless of whether they are being used by another machine, because it cannot tell.
You can set ZFS pools so that they aren't imported automatically on
boot (by using 'zpool import -R / ...
'). However, in our iSCSI SAN
environment each zpool import
takes approximately a third of a second
per LUN; a third of a second per LUN times a bunch of LUNs times a
bunch of pools is an infeasibly long amount of time.
(And before you ask, as far as I can tell there is no opportunity to do something clever before Solaris auto-imports all of the pools during boot because the auto-importation happens by special magic.)
The conclusion is that if a host crashes and you want to fail over its pools, you must make utterly sure that it will never spontaneously reappear. (I recommend going down to the machine room and pulling its power and network cables and removing its disks. Make sure you zero them before you return them to any spares pool you have.)
If you try hard enough there are ways around some of this, such as storage fencing, where you arrange with your backend so that each host in the SAN can only see the storage with the pools that it should be importing. But this is going to complicate your SAN and your failover, and again if anything ever goes wrong you will have catastrophic explosions.
(Much of this is fixed in things currently scheduled for Solaris 10 update 6. Unfortunately we need to start deploying our new fileserver environment before then.)
Comments on this page:
|
|