Our ZFS spares handling system (part 3)
July 20, 2011
In part 1 I mentioned that our spares system pulls what disks to use as spares from files and how the files are maintained was beyond the scope of the entry. Well, time to talk about that.
From more or less the beginning of our ZFS fileserver system we've had an administrative system that captured
a record of all pools on each physical server and a list of all disks
visible to that server and how they were being used by ZFS. This system
is relatively crude; it's shell scripts that run once a day and then
(At this point I will pause to note that all through our system we
translate iSCSI disk names from the local Solaris
Although we have disk usage information for physical servers, the spares files are built for our virtual fileservers; each virtual fileserver has its own list of spares, even if two virtual fileservers happen to be using the same physical server at the moment. We do this because each of our iSCSI backends is typically dedicated to a single virtual fileserver and we want to keep things that way even when we have to activate spares. The overall spares handling environment goes to some pains to make this work.
The whole process of building the spares files for the virtual fileservers is controlled by a configuration file with directives. There are two important sorts of directives in the file:
fs8 use backend lincoln
This means that the virtual fileserver
all exclude pool fs2-core-01
This means that all virtual fileservers should avoid (as spares) any
logical disks that share a physical disk with a disk used by the
(There are variants of these directives that allow us to be more specific about things, but in practice we don't need them.)
The spares-files build process is run on a single machine from cron, normally once a day. This low frequency of automated rebuilds is generally perfectly fine because disk usage information changes only very slowly. If we're replacing a backend there is a series of steps we have to do by hand to get the spares files rebuilt promptly, but that's an exceptional circumstance.
In theory we could have put all of this initial spares selection logic straight into the spares handling program. In practice, I feel that there's a very strong reason to keep them separate (in addition to this making both aspects of the spares problem simpler). Since we want each potential spare disk to only ever be usable by a single virtual fileserver, overall spares selection is inherently a global process. Global processes should be done once and centrally, because this avoids any chance that two systems will ever do them separately and disagree over what the answer should be. If we only ever generate spare disk lists in one place, we have a strong assurance that only serious program bugs will ever cause two fileservers to think that they can use the same disk as a spare. If the fileservers did this themselves, there are all sorts of (de)synchronization issues that could cause such duplication.
(We can also post-process the output files to check that this constraint holds true.)
Written on 20 July 2011.
* * *
Atom feeds are available; see the bottom of most pages.