How I wish ZFS pool importing could work
I've mentioned before that one of our
problems is that explicit '
zpool import' commands are very slow
in our environment, so slow that we don't try to do failover although we're theoretically set up to
do it. At least back in the Solaris era and I assume still in the
OmniOS one, this came about because of two reasons.
First, when you run '
zpool import' (for basically any reason) it
checks every disk you have, one at a time, to build up a mapping
of what ZFS labels are where and so on. Back when I timed it, this
seemed to take roughly a third of a second per visible 'disk' (a
real disk or an iSCSI LUN). Second, when your
zpool import command
finishes it promptly throws away all of that slowly and expensively
gathered information so the next '
zpool import' command you run
has to do it all over again.
Both of these together combine unpleasantly in typical failover
scenarios. You might do one '
zpool import' to confirm that all the
pools you want to import are fully visible and then '
five or six pools (one at a time, because you can't import multiple
pools at once with a normal '
zpool import' command). The resulting
time consumption adds up fast.
What I would like is for a way to have ZFS pool imports fix both problems. Sequential disk probing is an easy fix; just don't do that. Scanning some number of disks in parallel ought to significantly speed things up and even modest levels of parallelism offer potentially big wins (eg, doing two disks in parallel could theoretically halve the time necessary).
There are two potential fixes for the problem of '
throwing away all of that work. The simpler is to make it possible
to import multiple pools in a single '
zpool import' operation.
There's no fundamental obstacle in the code for this, it's just a
small matter of creating a command line syntax for it and then
basically writing a loop over the import operation (right now giving
two pool names renames a pool on import and giving more than two
is a syntax error). The bigger fix is to provide an option for
zpool import to not throw away the work, letting it write out the
accumulated information to a cache file and then reload it under
suitable conditions (both should require a new command line switch).
If the import process finds that the on-disk reality doesn't match
the cache file's data, it falls back to doing the current full scan
(checking disks in parallel, please).
At this point some people will be tempted to suggest ZFS cache
files. Unfortunately these are not a solution for
at least two reasons. First, you can't use ZFS cache files to
accelerate a scan for what pools are available for import; a plain
zpool import' doesn't take a '
-c cachefile' argument. Second,
there's no way to build or rebuild ZFS cache files without actually
importing pools. This makes managing them very painful in practice,
for example you can't have a single ZFS cache file with a global
view of all pools available on your shared storage unless you import
them all on one system and then save the resulting cache file.
(Scanning for visible pools matters in failover on shared storage because you really want to make sure that the machine you're failing over to can see all of the shared storage that it should. In fact I'd like a ZFS pool import option for 'do not import pools unless all of their devices are visible'; we'd certainly use it by default because in most situations in our environment we'd rather a pool not import at all than import with mirrors broken because eg an iSCSI target was accidentally not configured on one server.)