How ZFS stores symbolic links on disk

August 30, 2021

After writing about ZFS's new 'draid' vdev topology, I wound up curious about how ZFS actually stores the target of symbolic links on disk (which matters for draid, because draid has a relatively large minimum block size). The answer turns out to tie back to another ZFS concept, System Attributes. As a quick summary, ZFS system attributes (SAs) are a way for ZFS to pack a more or less arbitrary collection of additional information, such as the parent directory of things, into ZFS dnodes. Normally this is done using extra space in dnodes that's called the bonus buffer, but it can overflow into a spill block if necessary.

The answer to how ZFS stores the target of symbolic links is that they are a System Attribute. You can see it listed as ZPL_SYMLINK in the enum of known system attributes in zfs_sa.h, along with a variety of other ones. There's also apparently an older scheme for storing these dnode attributes, which appears to use a more or less hard coded structure for them based on the znode_phys struct that's also defined in zfs_sa.h. You're only going to see this scheme if you have very old filesystems, because it was introduced in 2010 in ZFS filesystem version 5 (which requires ZFS pool version 24 or later).

(Because we've been running ZFS for a rather long time now, starting with Solaris 10, we actually have some ZFS filesystems that are still version 4. Probably we should schedule a 'zfs upgrade' one of these days, if only so all of our filesystems are on the same version. All of our pools are recent enough, since the pools were recreated in our move to our Linux fileservers, but some of the filesystems have been moved around with 'zfs send' since more or less the beginning, which preserves at least some limitations of the original filesystems.)

If you use 'zdb -v -O POOL PATH/TO/SYMLINK' to dump a modern, system attribute based symbolic link, what you'll see is something like this:

 Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
2685091    1   128K    512      0     512    512    0.00  ZFS plain file
                                            183   bonus  System attributes
  dnode flags: USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
  dnode maxblkid: 0
  target  ../target
  uid ..
  gid ...
  atime   Mon Aug 30 22:06:38 2021
[etc]

What zdb reports as the 'target' attribute is the literal text of the target of the symbolic link, as shown by eg 'ls -l' or reported by readlink. It comes directly from the relevant system attribute, and is reported by cmd/zdb.c's dump_znode_symlink().

(Based on a quick look at the code, I don't think zdb can dump the older format of symlinks, although I may well be missing a zdb trick.)

PS: A sufficiently long symlink target will presumably overflow the amount of space available in the dnode bonus buffer and force the allocation of a spill block to hold some of the system attributes. I'm not sure how much space is normally available and I don't plan to dig further in the source (or do experiments) to find out. This isn't very different from other Unix filesystems; ext4 can only embed symlink targets in the inode if they're less than 60 bytes long, for example.

Written on 30 August 2021.
« Some notes on OpenZFS's new 'draid' vdev redundancy type
Go doesn't have a stack the way that some other languages do »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 30 22:32:31 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.