Where I feel that btrfs went wrong
I recently finished reading this LWN series on btrfs, which was the most in-depth exposure at the details of using btrfs that I've had so far. While I'm sure that LWN intended the series to make people enthused about btrfs, I came away with a rather different reaction; I've wound up feeling that btrfs has made a significant misstep along its way that's resulted in a number of design mistakes. To explain why I feel this way I need to contrast it with ZFS.
Btrfs and ZFS are each both volume managers and filesystems merged together. One of the fundamental interface differences between them is that ZFS has decided that it is a volume manager first and a filesystem second, while btrfs has decided that it is a filesystem first and a volume manager second. This is what I see as btrfs's core mistake.
(Overall I've been left with the strong impression that btrfs basically considers volume management to be icky and tries to have as little to do with it as possible. If correct, this is a terrible mistake.)
Since it's a volume manager first, ZFS places volume management front
and center in operation. Before you do anything ZFS-related, you need
to create a ZFS volume (which ZFS calls a pool); only once this is done
do you really start dealing with ZFS filesystems. ZFS even puts the two
jobs in two different commands (zpool
for pool management, zfs
for filesystem management). Because it's firmly made this split, ZFS is
free to have filesystem level things such as df
present a logical,
filesystem based view of things like free space and device usage. If
you want the actual physical details you go to the volume management
commands.
Because btrfs puts the filesystem first it wedges volume creation in
as a side effect of filesystem creation, not a separate activity,
and then it carries a series of lies and uselessly physical details
through to filesystem level operations like df
. Consider the the
discussion of what df
shows for a RAID1 btrfs filesystem here, which has both a lie (that the
filesystem uses only a single physical device) and a needlessly physical
view (of the physical block usage and space free on a RAID 1 mirror
pair). That btrfs refuses to expose itself as a first class volume
manager and pretends that you're dealing with real devices forces
it into utterly awkward things like mounting a multi-device btrfs
filesystem with 'mount /dev/adevice /mnt
'.
I think that this also leads to the asinine design decision that subvolumes have magic flat numeric IDs instead of useful names. Something that's willing to admit it's a volume manager, such as LVM or ZFS, has a name for the volume and can then hang sub-names off that name in a sensible way, even if where those sub-objects appear in the filesystem hierarchy (and under what names) gets shuffled around. But btrfs has no name for the volume to start with and there you go (the filesystem-volume has a mount point, but that's a different thing).
All of this really matters for how easily you can manage and keep track
of things. df
on ZFS filesystems does not lie to me; it tells me where
the filesystem comes from (what pool and what object path within the
pool), how much logical space the filesystem is using (more or less),
and roughly how much more I can write to it. Since they have full names,
ZFS objects such as snapshots can be more or less self documenting if
you name them well. With an object hierarchy, ZFS has a natural way to
inherit various things from parent object to sub-objects. And so on.
Btrfs's 'I am not a volume manager' approach also leads it to drastically limit the physical shape of a btrfs RAID array in a way that is actually painfully limiting. In ZFS, a pool stripes its data over a number of vdevs and each vdev can be any RAID type with any number of devices. Because ZFS allows multi-way mirrors this creates a straightforward way to create a three-way or four-way RAID 10 array; you just make all of the vdevs be three or four way mirrors. You can also change the mirror count on the fly, which is handy for all sorts of operations. In btrfs, the shape 'raid10' is a top level property of the overall btrfs 'filesystem' and, well, that's all you get. There is no easy place to put in multi-way mirroring; because of btrfs's model of not being a volume manager it would require changes in any number of places.
(And while I'm here, that btrfs requires you to specify both your data and your metadata RAID levels is crazy and gives people a great way to accidentally blow their own foot off.)
As a side note, I believe that btrfs's lack of allocation guarantees in a raid10 setup makes it impossible to create a btrfs filesystem split evenly across two controllers that is guaranteed to survive the loss of one entire controller. In ZFS this is trivial because of the explicit structure of vdevs in the pool.
PS: ZFS is too permissive in how you can assemble vdevs, because there is almost no point of a pool with, say, a mirror vdev plus a RAID-6 vdev. That configuration is all but guaranteed to be a mistake in some way.
|
|