Wandering Thoughts archives

2009-04-16

What causes the ZFS file deletion problem with snapshots

Suppose that you have a ZFS pool that you have filled utterly up to the brim, completely full, and that it has a snapshot. Of course, now you need to delete some files in a filesystem to clean up space, so:

$ rm oldfile
rm: cannot remove `oldfile': No space left on device

(Here oldfile is a file that is old enough to be in the snapshot.)

The ultimate cause of this is a general problem with 'copy on write' filesystems such as ZFS. In these filesystems, you never overwrite anything, either data or filesystem metadata; any time you change anything, you have to write it to a new location (and then update all of the things pointing to it, all the way up to the root (which technically does get overwritten, sort of)).

This applies to file deletions just as much as it applies to anything else, as deleting a file means updating the directory it is (or was) in and so on. In theory, when there is no free space in the pool this means that there is no new location to put that update, and so removing a file can fail. In practice copy on write filesystems, ZFS included, are smart enough to know that this is stupid, so they normally manage to find some temporary extra space for this situation.

What's happened in our example is that snapshots have complicated the picture. Deleting a file that is also in a snapshot doesn't actually free up any space (since the file has to stick around for the snapshot). Instead, it means that you really do need new space for a new copy of all of the relevant metadata (to mark the file as deleted in the current version), and when the ZFS pool is full, there is no room for this new copy and so a removal has to fail.

ZFSDeleteProblem written at 01:16:46; Add Comment

2009-04-15

The problem with Solaris 10 update 6's ZFS failmode setting

After I was so negative on ZFS's new failmode setting, one might sensibly ask what the problem with it is.

(Background: the ZFS failmode setting controls what happens when ZFS can't perform IO to a pool because the pool has totally lost redundancy. It has three settings, one to panic your system (just like the old behavior), one to block all IO until the devices recover, and one to continue as much as possible.)

The problem that I observed in our iSCSI based environment is that if you use any non-panic failmode setting, a ZFS pool failure of this sort eventually winds up hanging the kernel's entire ZFS infrastructure (piece by piece; it does not happen all at once). This partially affects even unrelated pools, pools that are still fully intact. The hang persists even if connectivity to the disks returns, and is so thorough that the system will not reboot; I consistently had to power-cycle our test server in order to recover it.

The direct cause of the hang seems to be asking the kernel for detailed ZFS pool information about a problem pool (after enough time has elapsed). Running 'zpool status' is one way to cause this to happen (even on unrelated pools), but it gets worse; fmd (the useless fault manager daemon) also asks the kernel for this information every so often, thereby guaranteeing that this happens no matter what you do. As far as I can tell, you cannot really disable fmd without causing huge problems.

The net effect is that in a failure, your ZFS pool hangs irretrievably after a while, eventually taking much of the rest of the system with it. For us, this is actually far worse than the system panicing and rebooting without some ZFS pools.

(I managed to capture some kernel crash dumps; the affected processes, including sync, seemed to be stuck in zfs_ioc_pool_stats in the kernel.)

(This is probably a known bug. Insert rant here about an 'enterprise ready' operating system where you cannot run fault diagnosis programs during a fault without making the situation much, much worse.)

ZFSFailmodeProblem written at 02:48:11; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.