2008-04-21
Dear ZFS: please stop having your commands stall
One of my serious irritations with ZFS is how various ZFS commands (or
at least sub-commands of zpool) will stall badly if it can't talk to
some of the underlying disk devices. This is especially apparent when
you're using iSCSI; for example, I accidentally booted a system with the
iSCSI cable plugged into the wrong port, and once the system had booted
zpool list simply hung.
(Worse, it hung uninterruptibly; I could not stop it with ^C, use job
control to background it, make it abort with ^\, or even kill -9 it
from another window.)
One of the really unfortunate effects of this is that it really hampers
my ability to do a lot of diagnostic work, because both zpool status
and zpool iostat -v stall or run very, very slowly. (iostat itself
works fine, which makes me really irritated with ZFS.)
(It is possible that Solaris MPxIO is contributing to this, since our 'iSCSI' devices are actually the MPxIO versions, but as a sysadmin I don't care exactly why the ZFS commands stall, just that they do. The downside of Sun owning the entire stack is that they don't get to point fingers at anyone else.)
I believe that ZFS commands behave okay if the iSCSI machine is explicitly rejecting Solaris's connection attempts (or reporting that the target or the LUN doesn't exist or the like). What seems to be near-fatal is when the iSCSI target simply isn't responding. Unfortunately this is the most likely failure mode; switch failure, controller failure, controller rebooting, etc.
(You also get the same issue if the iSCSI target is responding very, very slowly, as I found out when our theoretically jumbo frame capable gigabit switch decided to switch jumbo frames so slowly that it had a bandwidth measured in kilobytes per second.)
2008-04-02
ZFS: Reservations versus quotas
Suppose that you have a a number of groups (or people), each of which is purchased a certain amount of disk space, and that you have put all of the groups in the same overall ZFS pool. ZFS gives you two ways to limit each group's disk usage to the amount of space that they're entitled to: reservations or quotas. Which one is better?
It turns out that the answer is simple: use quotas.
(In both cases you give each group a top level container filesystem and give it a quota or a reservation of however much they have. With reservations you need another 'system' container that reserves any remaining space.)
Quotas have at least two advantages over reservations for this:
dfgives more useful output. With quotas, the 'size' of a filesystem is the group's quota; with reservations, the size is the total pool size (although the 'available' figure is correct).- snapshots work better if the group has filled up their space. With reservations, creating a snapshot fails (with an out of space error); with quotas, users just can't remove or modify files (they get a quota exceeded error) until you remove the snapshot.
(This is still inconvenient, but better overall than no snapshot at all.)
Since you can't create snapshots if your entire pool fills up, you'll still want to make sure that your pool has a little bit of spare space that no group can use. This is easy to do with quotas but not so much with reservations.
Nested quotas work right (or at least the way I want them to work); you cannot accidentally override a higher-level quota by giving a sub-filesystem a big quota value.