2016-01-11
The drawback of setting an explicit mount point for ZFS filesystems
ZFS has three ways of getting filesystems mounted and deciding where
they go in the filesystem hierarchy. As covered in the zfs
manpage,
you have a choice of automatically putting the filesystem below the
pool (so that tank/example
is mounted as /tank/example
), setting
an explicit mount point with mountpoint=/some/where
, or marking the
filesystem as 'legacy' so that you mount it yourself through whatever
means you want (usually /etc/vfstab
, the legacy approach to filesystem
mounts). With either of the first two options, ZFS will automatically
mount and unmount filesystems as you import and export pools or do
various other things (and will also automatically share them over NFS if
set to do so); with the third, you're on your own to manage things.
The first approach is ZFS's default scheme and what many people
follow. However, for what is in large part historical reasons we
haven't used it; instead we've explicitly specified our mount points
with mountpoint=/some/where
on our fileservers.
When I set up ZFS on Linux on my office
workstation I also set the mount points explicitly, because I was
migrating existing filesystems into ZFS and I didn't feel like
trying to change their mount points (or add another layer of bind
mounts).
For both our fileservers and my workstation, this has turned out to
sometimes be awkward. The largest problem comes if you're in the
process of moving a filesystem from one pool to another on the same
server using zfs send
and zfs recv
. If mountpoint
was unset,
both versions of the filesystem could coexist, with one as
/oldpool/fsys
and the other as /newpool/fsys
. But with mountpoint
set, they both want to be mounted on the same spot and only one can
win. This means we have to be careful to use 'zfs recv -u
' and even
then we have to worry a bit about reboots.
(You can set 'canmount=off
' or clear the 'mountpoint
' property
on the new-pool version of the filesystem for the time when the
filesystem is only part-moved, but then you have a divergence between
your received snapshot and the current state of the filesystem and
you'll have to force further incremental receives with 'zfs recv
-F
. This is less than ideal, although such a divergence can happen
anyways for other reasons.)
On the other hand, there are definite advantages to not having the mount point change and for having mount points be independent of the pool the filesystem is in. There's no particular reason that either users or your backup system need to care which pool a particular filesystem is in (such as whether it's in a HD-based pool or a SSD-based one, or a mirrored pool instead of a slower but more space efficient RAIDZ one); in this world, the filesystem name is basically an abstract identifier, instead of the 'physical location' that normal ZFS provides.
(ZFS does not quite do 'physical location' as such, but the pool plus the position within the pool's filesystem hierarchy may determine a lot about stuff like what storage the data is on and what quotas are enforced. I call this the physical location for lack of a better phrase, because users usually don't care about these details or at least how they're implemented.)
On the third hand, arguably the right way to provide an 'abstract identifier' version of filesystems (if you need it) is to build another layer on top of ZFS. On Solaris, you'd probably do this through the automounter with some tool to automatically generate the mappings between logical filesystem identifiers and their current physical locations.
PS: some versions of 'zfs receive
' allow you to set properties
on the received filesystem; unfortunately, neither OmniOS nor ZFS
on Linux currently support that. I also suspect that doing this
creates the same divergence between received snapshot and received
filesystem that setting the properties by hand does, and you're
back to forcing incremental receives with 'zfs recv -F
' (and
re-setting the properties and so on).
(It's sort of a pity that canmount
is not inherited, because
otherwise you could receive filesystems into a special 'newpool/nomount
'
hierarchy that blocked mounts and then active them later by using
'zfs rename
' to move them out to their final place. But alas,
no.)
The benefits of flexible space usage in filesystems
My home and work Linux machines are very similar. They have about the same collection of filesystems, and they both have significant amounts of free disk space (my home machine more than my work machine, because I've got much bigger disks there). But despite this, the filesystems on my work machine have lots of free space while the filesystems on my home machine tend to run perpetually relatively close to being out of space.
At one level, the difference is in how the disk space is managed. At work, I've migrated to ZFS on Linux; at home, everything is ext3 on top of LVM (on top of a software RAID mirror). But the real answer is that shrinking an extN filesystem and a LVM logical volume is kind of a pain, and also kind of dangerous (at least as far as I know). If I grew filesystems wildly at home, it'd be a pain to shrink them later if I needed the space elsewhere, so for the most part I only expand filesystems when I really need the space.
In theory this shouldn't make any difference; if I need the space, I'll grow the filesystem. In practice it makes me irrationally reluctant to do things that need substantial chunks of space temporarily. I would probably be better off if I adopted a policy that all of the filesystems I used actively should have, say, 40 GB of free space more or less at all times, but I'm not that sensible.
(There's some irrational bit of me that still thinks that disk space is in short supply. It's not; I have more than a TB free, and that's after extravagantly using space to store more or less every photograph I've ever taken. In RAW format, no less.)
This doesn't happen at work because ZFS dynamically shares the free pool space between all of the filesystems. Unless you go out of your way to set it up otherwise, there is no filesystem specific free space, just general pool free space that is claimed (and then released) as you use space in filesystems. Filesystems are just an organizational thing, not something that forces up-front space allocation. So I can use however much space I want, wherever I want to, and the only time I'll run out of space in a filesystem is if I'm genuinely out of all disk space.
This is a really nice feature of ZFS, and I wish I had it at home. It would clearly make my life easier by entirely removing one current concern; I just wouldn't have to manage space on a per-filesystem basis any more. Space would just be space.
(Someday I will have this at home, by migrating my home system to ZFS. But that probably won't be for a while for various reasons. Not having ZFS at home is still tolerable, so I suspect that I won't migrate until I'm migrating hardware anyways and that probably won't be for a while for various reasons.)
PS: btrfs is not what I consider a viable option. At this point I'd probably only consider btrfs once a major Linux distribution has made it their default filesystem for new installs and has survived at least a year of that choice without problems. And I'm not holding my breath for that.
Sidebar: Why I believe shrinking a LVM logical volume is dangerous
To grow a filesystem inside a LVM volume, you first grow the volume and then grow the filesystem to use the new space. To shrink a volume, you do this in reverse; you first shrink the filesystem and then shrink the volume. However, as far as I know there is nothing in LVM that prevents you from accidentally shrinking the volume so that it is smaller than the filesystem. Doing this by accident will truncate the end of your filesystem, almost definitely lose some of your data, and quite probably destroy the filesystem outright. Hence the danger of doing this.
It would be really great if LVM knew about common filesystem types, could read their superblocks to determine the FS-level size, and by default would refuse to shrink a logical volume below that size. But as far as I know it doesn't.
(As a practical matter I probably never want to shrink a filesystem without a current backup of it, which brings up the awkward subject of my home backup strategy.)