== Understanding ZFS cachefiles in Solaris 10 update 6 The Solaris 10 update 6 introduced the new ZFS pool property _cachefile_, and with it the idea of ZFS cachefiles. I [[misunderstood ZFSAndSolaris10U6]] what these were before the S10U6 release, so I feel like writing down what they are and how you can use them in a failover environment. To be able to quickly import a pool without scanning all of the devices on your system, ZFS keeps a cache of information about the pool and the devices its found on. Before S10U6 there was only one on your system, _/etc/zfs/zpool.cache_, and ZFS made this serve double duty as the list of pools to automatically import when the system booted. In S10U6, things were changed so that pools can specify an alternate ZFS cachefile instead of the system default one. (Note that ZFS cachefiles don't contain information about filesystems inside the pools, so they don't change very often.) Using an alternate ZFS cachefile has several effects: * any pool not using the system default cachefile is not automatically imported on boot. * if you have the cachefile for a pool, you can rapidly import it even if an ordinary '_zpool import_' would be [[achingly slow ZFSSanFailoverProblem]]. * you can easily (and rapidly) import all pools in a cachefile (with '_zpool import -c cachefile -a_'). One tricky note: the cachefile file that _zpool import_ uses does *not* have to be the same file named by the pool's _cachefile_ property. The _cachefile_ property only gives the file that is updated when you change various pool configuration things. Crucially this includes _zpool export_; if you export a pool, ~~the pool is removed from its cachefile~~. (This is really annoying if you want to use ZFS cachefiles to speed up importing ZFS pools.) Cachefiles can be copied from system to system, at least if the systems are x86 ones. (We have no Solaris 10 SPARC systems, so I can't test if it works cross-architecture.) So one way to set up a failover environment goes like this: * group pools together, for example all of the pools for a given [[virtual fileserver ZFSFileserverSetup]], and give them all the same non-default ZFS cachefile, for example _/var/local/zfs/fs~~N~~.cache_. * replicate every group's ZFS cachefile to every physical fileserver you have; _rsync_ will do. (Remember to explicitly resync after you make a pool configuration change, such as adding devices.) * when you have to bring up a virtual fileserver on another machine, get all the pools up (and fast) by running '_zpool import -a_' on the appropriate cachefile (in addition to higher level failover tasks like bringing up an IP alias). * on boot, use some external mechanism to decide what virtual fileservers a physical machine owns and then invoke '_zpool import -a_' on the appropriate cachefile or cachefiles. The one gotcha is that because of the effects of _zpool export_, bringing down a virtual fileserver in an orderly way can't really involve exporting its pools, or at least requires tricking ZFS a lot. (I think that you would want to copy the pre-shutdown ZFS cachefile somewhere before all of the exports, then copy it back afterwards.) If you just want fast pool imports for emergency failure and the only ZFS pools you have are on shared storage, you don't even need to set up alternate ZFS cachefiles for your ZFS pools; it's enough to make sure that every system has a copy of every other system's _/etc/zfs/zpool.cache_ file under some convenient name. (Once we upgrade to S10U6 on all of our fileservers, we will probably do at least this, just as a general precaution.)