Wandering Thoughts archives

2009-09-10

What I know about how ZFS actually handles spare disks

Like many other RAID-oid systems, ZFS has a notion of spare disks; you can add one or more spare disks to a pool, and ZFS will use them as necessary in order to maintain pool redundancy in the face of disk problems. For details, you can see the zpool manpage.

Well, sort of. Actually, how ZFS handles spare disks is significantly different from how normal RAID systems handle them, and the pleasantly bland and normal description of spares in the zpool manpage elides a significant number of important things. The following is what I have been able to gather about the situation from various sources (since Sun doesn't seem to actually document it).

In a traditional RAID system with spares, spare handling is part of the main RAID code in the kernel, with spares activated automatically when needed. In Solaris this is not the case; the only thing that the kernel ZFS code does is keep track of the list of spares and some state information about them. Activating a spare is handled by user-level code, which issues the equivalent of 'zpool replace <pool> <old-dev> <spare-dev>' through a library call. Specifically, activating ZFS spares is the job of the zfs-retire agent of fmd, the Solaris fault manager daemon.

(Once zfs-retire activates the spare, the ZFS kernel code handles the rest of the process, including marking the spare in use and setting up the special 'this device is replaced with a spare' vdev. This means that you can duplicate a spare activation by doing a 'zpool replace' by hand if you ever want to.)

In theory, using fmd for this is equivalent to doing it all in the kernel. In practice, your ZFS spare handling is at the mercy of everything working right and it doesn't always do so. For one prominent example, it is up to the zfs-retire module to decide what should cause it to activate a spare, and it has not always done so for everything that degrades a ZFS vdev.

My primary sources for all of this are this Eric Shrock entry and the archives of the zfs-discuss mailing list. Examination of the OpenSolaris codebase has also been useful (although if you are tempted to do this, beware; it does not necessarily correspond with Solaris 10).

Sidebar: what is required for spare activation

In order for a spare to be activated, a great many moving parts of your system have to all be working right. I feel like writing them down (at least the ones that I can think of):

  • fmd has to be running
  • fmd has to be getting (and generating) relevant events, which may require various fmd modules to be working correctly
  • the zfs-retire agent has to be working, and to have subscribed to those events
  • zfs-retire has to decide that the event is one that should cause it to activate a spare.
  • zfs-retire has to be able to query the kernel (I think) to get the problem pool's configuration in order to find out what spares are available. (This can fail.)
  • zfs-retire has to be able to issue the necessary 'replace disk' system call.

A further side note on events: in an ideal world, there would be a 'ZFS vdev <X> has been degraded because of device <Y>' event that zfs-retire would listen for. If you think that Solaris lives in this world, I have bad news for you.

solaris/ZFSSpareHandling written at 00:51:35; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.