Wandering Thoughts archives

2014-05-21

How I wish ZFS pool importing could work

I've mentioned before that one of our problems is that explicit 'zpool import' commands are very slow in our environment, so slow that we don't try to do failover although we're theoretically set up to do it. At least back in the Solaris era and I assume still in the OmniOS one, this came about because of two reasons. First, when you run 'zpool import' (for basically any reason) it checks every disk you have, one at a time, to build up a mapping of what ZFS labels are where and so on. Back when I timed it, this seemed to take roughly a third of a second per visible 'disk' (a real disk or an iSCSI LUN). Second, when your zpool import command finishes it promptly throws away all of that slowly and expensively gathered information so the next 'zpool import' command you run has to do it all over again. Both of these together combine unpleasantly in typical failover scenarios. You might do one 'zpool import' to confirm that all the pools you want to import are fully visible and then 'zpool import' five or six pools (one at a time, because you can't import multiple pools at once with a normal 'zpool import' command). The resulting time consumption adds up fast.

What I would like is for a way to have ZFS pool imports fix both problems. Sequential disk probing is an easy fix; just don't do that. Scanning some number of disks in parallel ought to significantly speed things up and even modest levels of parallelism offer potentially big wins (eg, doing two disks in parallel could theoretically halve the time necessary).

There are two potential fixes for the problem of 'zpool import' throwing away all of that work. The simpler is to make it possible to import multiple pools in a single 'zpool import' operation. There's no fundamental obstacle in the code for this, it's just a small matter of creating a command line syntax for it and then basically writing a loop over the import operation (right now giving two pool names renames a pool on import and giving more than two is a syntax error). The bigger fix is to provide an option for zpool import to not throw away the work, letting it write out the accumulated information to a cache file and then reload it under suitable conditions (both should require a new command line switch). If the import process finds that the on-disk reality doesn't match the cache file's data, it falls back to doing the current full scan (checking disks in parallel, please).

At this point some people will be tempted to suggest ZFS cache files. Unfortunately these are not a solution for at least two reasons. First, you can't use ZFS cache files to accelerate a scan for what pools are available for import; a plain 'zpool import' doesn't take a '-c cachefile' argument. Second, there's no way to build or rebuild ZFS cache files without actually importing pools. This makes managing them very painful in practice, for example you can't have a single ZFS cache file with a global view of all pools available on your shared storage unless you import them all on one system and then save the resulting cache file.

(Scanning for visible pools matters in failover on shared storage because you really want to make sure that the machine you're failing over to can see all of the shared storage that it should. In fact I'd like a ZFS pool import option for 'do not import pools unless all of their devices are visible'; we'd certainly use it by default because in most situations in our environment we'd rather a pool not import at all than import with mirrors broken because eg an iSCSI target was accidentally not configured on one server.)

solaris/ZFSPoolImportWish written at 01:32:27; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.