Understanding ZFS cachefiles in Solaris 10 update 6

February 1, 2009

The Solaris 10 update 6 introduced the new ZFS pool property cachefile, and with it the idea of ZFS cachefiles. I misunderstood what these were before the S10U6 release, so I feel like writing down what they are and how you can use them in a failover environment.

To be able to quickly import a pool without scanning all of the devices on your system, ZFS keeps a cache of information about the pool and the devices its found on. Before S10U6 there was only one on your system, /etc/zfs/zpool.cache, and ZFS made this serve double duty as the list of pools to automatically import when the system booted. In S10U6, things were changed so that pools can specify an alternate ZFS cachefile instead of the system default one.

(Note that ZFS cachefiles don't contain information about filesystems inside the pools, so they don't change very often.)

Using an alternate ZFS cachefile has several effects:

  • any pool not using the system default cachefile is not automatically imported on boot.
  • if you have the cachefile for a pool, you can rapidly import it even if an ordinary 'zpool import' would be achingly slow.
  • you can easily (and rapidly) import all pools in a cachefile (with 'zpool import -c cachefile -a').

One tricky note: the cachefile file that zpool import uses does not have to be the same file named by the pool's cachefile property. The cachefile property only gives the file that is updated when you change various pool configuration things. Crucially this includes zpool export; if you export a pool, the pool is removed from its cachefile.

(This is really annoying if you want to use ZFS cachefiles to speed up importing ZFS pools.)

Cachefiles can be copied from system to system, at least if the systems are x86 ones. (We have no Solaris 10 SPARC systems, so I can't test if it works cross-architecture.)

So one way to set up a failover environment goes like this:

  • group pools together, for example all of the pools for a given virtual fileserver, and give them all the same non-default ZFS cachefile, for example /var/local/zfs/fsN.cache.

  • replicate every group's ZFS cachefile to every physical fileserver you have; rsync will do. (Remember to explicitly resync after you make a pool configuration change, such as adding devices.)

  • when you have to bring up a virtual fileserver on another machine, get all the pools up (and fast) by running 'zpool import -a' on the appropriate cachefile (in addition to higher level failover tasks like bringing up an IP alias).

  • on boot, use some external mechanism to decide what virtual fileservers a physical machine owns and then invoke 'zpool import -a' on the appropriate cachefile or cachefiles.

The one gotcha is that because of the effects of zpool export, bringing down a virtual fileserver in an orderly way can't really involve exporting its pools, or at least requires tricking ZFS a lot. (I think that you would want to copy the pre-shutdown ZFS cachefile somewhere before all of the exports, then copy it back afterwards.)

If you just want fast pool imports for emergency failure and the only ZFS pools you have are on shared storage, you don't even need to set up alternate ZFS cachefiles for your ZFS pools; it's enough to make sure that every system has a copy of every other system's /etc/zfs/zpool.cache file under some convenient name.

(Once we upgrade to S10U6 on all of our fileservers, we will probably do at least this, just as a general precaution.)

Written on 01 February 2009.
« Why social mudding works
A grumpy remark about Solaris's scalability »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 1 23:55:51 2009
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.