2010-03-24
One reason why 'zpool status
' can hang
The 'zpool status
' command is infamous for stalling
and hanging exactly when you need it the most, namely when something is
going wrong with your system. I've recently run down one reason why it
does this.
The culprit is my old friend ZFS GUIDs. Information
about disks involved in ZFS pools includes both their GUIDs and their
theoretical device paths. When 'zpool status
' prints out the user
friendly shortened device names, it doesn't just take the theoretical
path and trim most things off to get the device name; as I've alluded
to before, it decides to be hyper-correct and check
that the device named in the configuration really is the right device.
In theory this is simple, as Solaris has some system calls for doing
pretty much all of the work. In practice these system calls require
you to open the disk device that you want to check, and under some
circumstances this open()
will stall for significant amounts of time
(several minutes, for example). An iSCSI target that isn't responding is
one such circumstance.
If you've ever seen this happen, you might wonder why 'zpool status
'
hangs completely immediately before printing any pool configuration
information, instead of getting to the point where it starts to print
device names for affected devices. The answer is that 'zpool status
'
is helpfully extra-clever; before it prints out any pool configuration
stuff, it pauses to work out how wide it has to make the device name
column so that everything will line up nicely. This requires working
out the friendly name of all devices, which requires that hyper-correct
checking of the configuration, which stalls the entire process if any
disk is very slow to open()
.
(For extra fun, this 'calculate the needed width' step also looks at the
spare disks (if any), so a single bad spare disk, one that's not even in
use, can cause 'zpool status
' to stall on you.)
The 'zpool iostat
' command also does the same extra-clever step of
working out the maximum width of the device name column, so it will
stall for the same reason. For bonus points, 'zpool iostat
' does
this every time it prints out a round of statistics. Yes, really.
No wonder plain iostat
is acres better if anything bad is going
on.
By the way, this particular stall only happens if you have permissions
to open the device in the first place, ie it only happens if you are
root. So if you suspect ZFS problems, especially if you want 'zpool
iostat
' results, run the commands as a non-root user.
(This is not the only way that zpool status
can stall; I've seen it
stutter when it was trying to get the ZFS pool configuration from the
kernel.)
Sidebar: where this is in the code
The zpool
source code is usr/src/cmd/zpool, and the whole width
calculation stuff is in zpool_main.c:max_width(). This calls
zpool_vdev_name(), in lib/libzfs/common/libzfs_pool.c, which
calls path_to_devid(), which actually open()
's the device. This
check is thoughtfully guarded to make sure that it doesn't open devices
that ZFS has gotten around to declaring are actually bad; sadly, ZFS
makes such declarations long after open()
's of iSCSI target disks have
started stalling for minutes at a time.