One reason that ZFS can't turn a directory into a filesystem

October 29, 2023

One of the wishes that I and other people frequently have for ZFS is the ability to take an existing directory (and everything underneath it) in a ZFS filesystem and turn it into a sub-filesystem of its own. One reason for wanting this is that a number of things are set and controlled on a per-filesystem basis in ZFS, instead of on a per-directory basis; if you have a (sub)directory where you want any special value for those, you need to make it a filesystem of its own. Often you may not immediately realize this before the directory exists and has been populated, and you discover the need for the special setting values. Today I realized that one reason ZFS doesn't have this feature is because of how ZFS filesystems are put together.

ZFS is often described as tree structured, and this is generally true; a lot of things in a ZFS pool are organized into a tree of objects. However, while filesystems are a tree at the logical level of directories and subdirectories, they aren't a tree as represented on disk. Directories in ZFS filesystems don't directly point to the disk addresses of their contents; instead, ZFS filesystems have a flat, global table of object numbers (effectively inode numbers) and all directory entries refer to things by object number. Since ZFS is a copy on write filesystem, this level of indirection is quite important in reducing how much has to be updated when a file or a directory is changed.

If ZFS filesystems used tree structured references at the level of directory entries (and we ignored hardlinks), it would make conceptual sense that you could take a directory object, pull it into a new filesystem, and patch its reference in its parent directory. All of the object references in the tree under the directory would stay the same; they would just be in a new container, the new filesystem. Filesystems would essentially be cut points in the overall object tree.

However, you can't make this model work when filesystems have a single global space of object numbers that are used in directory entries. A new filesystem has its own new table of object numbers, and you would have to move all of the objects referred to by the directory hierarchy into this new table, which means you'd have to walk the directory tree to find them all and then possibly update all of the directories if you changed their object numbers as part of putting them in a new object (number) table. This isn't the sort of work that you should be asking a filesystem to do in the kernel; it's much more suited for a user level tool.

Now that I've thought of this, it's even more understandable how ZFS doesn't have this feature, however convenient for me it would be, and how it never will.

(Hardlinks by themselves probably cause enough heartburn to sink a feature to turn a directory into a filesystem, although I can see ways to deal with them if you try hard enough.)


Comments on this page:

By nanaya at 2023-10-30 06:26:41:

I supposed technically you can clone the fs, mount/promote it, and then delete everything else...?

By cks at 2023-10-30 11:06:27:

I guess you can do that. You'd have to move things around in the new filesystem so that what was a subdirectory is now the root directory, and then delete them all in the old filesystem. I guess it might be worth it if things are big enough that copying them would take too long.

(And if everything is only being read, you could make the actual swap reasonably fast. You'd clone, clean up and move things around in the new filesystem, then move the old directory aside, make a new one to be the mount point, and rename the new, cleaned up filesystem on to it. But you'd lose updates to the old directory in the old filesystem, so there had better not be any.)

Written on 29 October 2023.
« Thinking about the sensible limits of customization of things
Finding which NFSv4 client owns a lock on a Linux NFS(v4) server »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sun Oct 29 22:43:07 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.