Wandering Thoughts archives

2016-01-11

The drawback of setting an explicit mount point for ZFS filesystems

ZFS has three ways of getting filesystems mounted and deciding where they go in the filesystem hierarchy. As covered in the zfs manpage, you have a choice of automatically putting the filesystem below the pool (so that tank/example is mounted as /tank/example), setting an explicit mount point with mountpoint=/some/where, or marking the filesystem as 'legacy' so that you mount it yourself through whatever means you want (usually /etc/vfstab, the legacy approach to filesystem mounts). With either of the first two options, ZFS will automatically mount and unmount filesystems as you import and export pools or do various other things (and will also automatically share them over NFS if set to do so); with the third, you're on your own to manage things.

The first approach is ZFS's default scheme and what many people follow. However, for what is in large part historical reasons we haven't used it; instead we've explicitly specified our mount points with mountpoint=/some/where on our fileservers. When I set up ZFS on Linux on my office workstation I also set the mount points explicitly, because I was migrating existing filesystems into ZFS and I didn't feel like trying to change their mount points (or add another layer of bind mounts).

For both our fileservers and my workstation, this has turned out to sometimes be awkward. The largest problem comes if you're in the process of moving a filesystem from one pool to another on the same server using zfs send and zfs recv. If mountpoint was unset, both versions of the filesystem could coexist, with one as /oldpool/fsys and the other as /newpool/fsys. But with mountpoint set, they both want to be mounted on the same spot and only one can win. This means we have to be careful to use 'zfs recv -u' and even then we have to worry a bit about reboots.

(You can set 'canmount=off' or clear the 'mountpoint' property on the new-pool version of the filesystem for the time when the filesystem is only part-moved, but then you have a divergence between your received snapshot and the current state of the filesystem and you'll have to force further incremental receives with 'zfs recv -F. This is less than ideal, although such a divergence can happen anyways for other reasons.)

On the other hand, there are definite advantages to not having the mount point change and for having mount points be independent of the pool the filesystem is in. There's no particular reason that either users or your backup system need to care which pool a particular filesystem is in (such as whether it's in a HD-based pool or a SSD-based one, or a mirrored pool instead of a slower but more space efficient RAIDZ one); in this world, the filesystem name is basically an abstract identifier, instead of the 'physical location' that normal ZFS provides.

(ZFS does not quite do 'physical location' as such, but the pool plus the position within the pool's filesystem hierarchy may determine a lot about stuff like what storage the data is on and what quotas are enforced. I call this the physical location for lack of a better phrase, because users usually don't care about these details or at least how they're implemented.)

On the third hand, arguably the right way to provide an 'abstract identifier' version of filesystems (if you need it) is to build another layer on top of ZFS. On Solaris, you'd probably do this through the automounter with some tool to automatically generate the mappings between logical filesystem identifiers and their current physical locations.

PS: some versions of 'zfs receive' allow you to set properties on the received filesystem; unfortunately, neither OmniOS nor ZFS on Linux currently support that. I also suspect that doing this creates the same divergence between received snapshot and received filesystem that setting the properties by hand does, and you're back to forcing incremental receives with 'zfs recv -F' (and re-setting the properties and so on).

(It's sort of a pity that canmount is not inherited, because otherwise you could receive filesystems into a special 'newpool/nomount' hierarchy that blocked mounts and then active them later by using 'zfs rename' to move them out to their final place. But alas, no.)

solaris/ZFSMountpointConundrum written at 23:48:55; Add Comment

The benefits of flexible space usage in filesystems

My home and work Linux machines are very similar. They have about the same collection of filesystems, and they both have significant amounts of free disk space (my home machine more than my work machine, because I've got much bigger disks there). But despite this, the filesystems on my work machine have lots of free space while the filesystems on my home machine tend to run perpetually relatively close to being out of space.

At one level, the difference is in how the disk space is managed. At work, I've migrated to ZFS on Linux; at home, everything is ext3 on top of LVM (on top of a software RAID mirror). But the real answer is that shrinking an extN filesystem and a LVM logical volume is kind of a pain, and also kind of dangerous (at least as far as I know). If I grew filesystems wildly at home, it'd be a pain to shrink them later if I needed the space elsewhere, so for the most part I only expand filesystems when I really need the space.

In theory this shouldn't make any difference; if I need the space, I'll grow the filesystem. In practice it makes me irrationally reluctant to do things that need substantial chunks of space temporarily. I would probably be better off if I adopted a policy that all of the filesystems I used actively should have, say, 40 GB of free space more or less at all times, but I'm not that sensible.

(There's some irrational bit of me that still thinks that disk space is in short supply. It's not; I have more than a TB free, and that's after extravagantly using space to store more or less every photograph I've ever taken. In RAW format, no less.)

This doesn't happen at work because ZFS dynamically shares the free pool space between all of the filesystems. Unless you go out of your way to set it up otherwise, there is no filesystem specific free space, just general pool free space that is claimed (and then released) as you use space in filesystems. Filesystems are just an organizational thing, not something that forces up-front space allocation. So I can use however much space I want, wherever I want to, and the only time I'll run out of space in a filesystem is if I'm genuinely out of all disk space.

This is a really nice feature of ZFS, and I wish I had it at home. It would clearly make my life easier by entirely removing one current concern; I just wouldn't have to manage space on a per-filesystem basis any more. Space would just be space.

(Someday I will have this at home, by migrating my home system to ZFS. But that probably won't be for a while for various reasons. Not having ZFS at home is still tolerable, so I suspect that I won't migrate until I'm migrating hardware anyways and that probably won't be for a while for various reasons.)

PS: btrfs is not what I consider a viable option. At this point I'd probably only consider btrfs once a major Linux distribution has made it their default filesystem for new installs and has survived at least a year of that choice without problems. And I'm not holding my breath for that.

Sidebar: Why I believe shrinking a LVM logical volume is dangerous

To grow a filesystem inside a LVM volume, you first grow the volume and then grow the filesystem to use the new space. To shrink a volume, you do this in reverse; you first shrink the filesystem and then shrink the volume. However, as far as I know there is nothing in LVM that prevents you from accidentally shrinking the volume so that it is smaller than the filesystem. Doing this by accident will truncate the end of your filesystem, almost definitely lose some of your data, and quite probably destroy the filesystem outright. Hence the danger of doing this.

It would be really great if LVM knew about common filesystem types, could read their superblocks to determine the FS-level size, and by default would refuse to shrink a logical volume below that size. But as far as I know it doesn't.

(As a practical matter I probably never want to shrink a filesystem without a current backup of it, which brings up the awkward subject of my home backup strategy.)

linux/FlexibleFilesystemSpaceBenefit written at 00:48:16; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.