2021-08-30
How ZFS stores symbolic links on disk
After writing about ZFS's new 'draid' vdev topology, I wound up curious about how ZFS actually stores the target of symbolic links on disk (which matters for draid, because draid has a relatively large minimum block size). The answer turns out to tie back to another ZFS concept, System Attributes. As a quick summary, ZFS system attributes (SAs) are a way for ZFS to pack a more or less arbitrary collection of additional information, such as the parent directory of things, into ZFS dnodes. Normally this is done using extra space in dnodes that's called the bonus buffer, but it can overflow into a spill block if necessary.
The answer to how ZFS stores the target of symbolic links is that
they are a System Attribute. You can see it listed as ZPL_SYMLINK
in the enum of known system attributes in zfs_sa.h
,
along with a variety of other ones. There's also apparently an older
scheme for storing these dnode attributes, which appears to use a
more or less hard coded structure for them based on the znode_phys
struct that's also defined in zfs_sa.h
.
You're only going to see this scheme if you have very old filesystems,
because it was introduced in 2010 in ZFS filesystem version 5 (which
requires ZFS pool version 24 or later).
(Because we've been running ZFS for a rather long time now,
starting with Solaris 10, we actually have
some ZFS filesystems that are still version 4. Probably we should
schedule a 'zfs upgrade
' one of these days, if only so all of our
filesystems are on the same version. All of our pools are recent
enough, since the pools were recreated in our move to our Linux
fileservers, but some of the
filesystems have been moved around with 'zfs send
' since more or
less the beginning, which preserves at least some limitations of
the original filesystems.)
If you use 'zdb -v -O POOL PATH/TO/SYMLINK
' to dump a modern,
system attribute based symbolic link, what you'll see is something
like this:
Object lvl iblk dblk dsize dnsize lsize %full type 2685091 1 128K 512 0 512 512 0.00 ZFS plain file 183 bonus System attributes dnode flags: USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED dnode maxblkid: 0 target ../target uid .. gid ... atime Mon Aug 30 22:06:38 2021 [etc]
What zdb reports as the 'target
' attribute is the literal text
of the target of the symbolic link, as shown by eg 'ls -l
' or
reported by readlink
. It comes directly from the relevant system
attribute, and is reported by cmd/zdb.c's dump_znode_symlink()
.
(Based on a quick look at the code, I don't think zdb can dump the older format of symlinks, although I may well be missing a zdb trick.)
PS: A sufficiently long symlink target will presumably overflow the amount of space available in the dnode bonus buffer and force the allocation of a spill block to hold some of the system attributes. I'm not sure how much space is normally available and I don't plan to dig further in the source (or do experiments) to find out. This isn't very different from other Unix filesystems; ext4 can only embed symlink targets in the inode if they're less than 60 bytes long, for example.