2024-06-18
Some things on how ZFS System Attributes are stored
To summarize, ZFS's System Attributes (SAs)
are a way for ZFS to pack a somewhat arbitrary collection of
additional information, such as the parent directory of things and symbolic link targets,
into ZFS dnodes in a general and flexible
way that doesn't hard code the specific combinations of attributes
that can be used together. ZFS system attributes are normally stored
in extra space in dnodes that's called the bonus buffer, but the
system attributes can overflow to a spill block if necessary.
I've written more about the high level side of this in my entry
on ZFS SAs, but today I'm going to write up
some concrete details of what you'd see when you look at a ZFS
filesystem with tools like zdb
.
When ZFS stores the SAs for a particular dnode, it simply packs all of their values together in a blob of data. It knows which part of the blob is which through an attribute layout, which tells it which attributes are in the layout and in what order. Attribute layouts are created and registered as they are needed, which is to say when some dnode wants to use that particular combination of attributes. Generally there are only a few combinations of system attributes that get used, so a typical ZFS filesystem will not have many SA layouts. System attributes are numbered, but the specific numbering may differ from filesystem to filesystem. In practice it probably mostly won't, since most attributes usually get registered pretty early in the life of a ZFS filesystem and in a predictable order.
(For example, the creation of a ZFS filesystem necessarily means creating a directory dnode for its top level, so all of the system attributes used for directories will immediately get registered, along with an attribute layout.)
The attribute layout for a given dnode is not fixed when the file is created; instead, it varies depending on what system attributes that dnode needs at the moment. The high level ZFS code simply sets or clears specific system attributes on the dnode, and the low(er) level system attribute code takes care of either finding or creating an attribute layout that matches the current set of attributes the dnode has. Many system attributes are constant over the life of the dnode, but I think others can come and go, such as the system attributes used for xattrs.
Every ZFS filesystem with system attributes has three special dnodes involved in this process, which zdb will report as the "SA master node", the "SA attr registration" dnode, and the "SA attr layouts" dnode. As far as I know, the SA master node's current purpose is to point to the other two dnodes. The SA attribute registry dnode is where the potentially filesystem specific numbers for attributes are registered, and the SA attribute layouts dnode is where the various layouts in use on the filesystem are tracked. The SA master (d)node itself is pointed to by the "ZFS master node", which is always object 1.
So let's use zdb to take a look at a typical case:
# zdb -dddd fs19-scratch-01/w/430 1 [...] Object lvl iblk dblk dsize dnsize lsize %full type 1 1 128K 512 8K 512 512 100.00 ZFS master node [...] SA_ATTRS = 32 [...] # zdb -dddd fs19-scratch-01/w/430 32 Object lvl iblk dblk dsize dnsize lsize %full type 32 1 128K 512 0 512 512 100.00 SA master node [...] LAYOUTS = 36 REGISTRY = 35
It's common for the registry and the layout to be consecutive, since they're generally allocated at the same time. On most filesystems they will have very low object numbers, since they were created when the filesystem was.
The registry is generally going to be pretty boring looking:
# zdb -dddd fs19-scratch-01/w/430 35 [...] Object lvl iblk dblk dsize dnsize lsize %full type 35 1 128K 1.50K 8K 512 1.50K 100.00 SA attr registration [...] ZPL_SCANSTAMP = 20030012 : [32:3:18] ZPL_RDEV = 800000a : [8:0:10] ZPL_FLAGS = 800000b : [8:0:11] ZPL_GEN = 8000004 : [8:0:4] ZPL_MTIME = 10000001 : [16:0:1] ZPL_CTIME = 10000002 : [16:0:2] ZPL_XATTR = 8000009 : [8:0:9] ZPL_UID = 800000c : [8:0:12] ZPL_ZNODE_ACL = 5803000f : [88:3:15] ZPL_PROJID = 8000015 : [8:0:21] ZPL_ATIME = 10000000 : [16:0:0] ZPL_SIZE = 8000006 : [8:0:6] ZPL_LINKS = 8000008 : [8:0:8] ZPL_PARENT = 8000007 : [8:0:7] ZPL_MODE = 8000005 : [8:0:5] ZPL_PAD = 2000000e : [32:0:14] ZPL_DACL_ACES = 40013 : [0:4:19] ZPL_GID = 800000d : [8:0:13] ZPL_CRTIME = 10000003 : [16:0:3] ZPL_DXATTR = 30014 : [0:3:20] ZPL_DACL_COUNT = 8000010 : [8:0:16] ZPL_SYMLINK = 30011 : [0:3:17]
The names of these attributes come from the enum of known system
attributes in zfs_sa.h
. The
important bit of the values of them is the '[16:0:1]' portion, which
is a decoded version of the raw number. The format of the raw number
is covered in sa_impl.h
, but
the short version is that the first number is the total length of
the attribute's value, in bytes, the third is its attribute number
within the filesystem, and then middle number is an index of how
to byteswap it if necessary
(and sa.c
has a nice comment about the whole scheme at the top).
(The attributes with a listed size of 0 store their data in extra special ways that are beyond the scope of this entry.)
The more interesting thing is the SA attribute layouts:
# zdb -dddd fs19-scratch-01/w/430 36 [...] Object lvl iblk dblk dsize dnsize lsize %full type 36 1 128K 16K 16K 512 32K 100.00 SA attr layouts [...] 2 = [ 5 6 4 12 13 7 11 0 1 2 3 8 21 16 19 ] 4 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 17 ] 3 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 ]
This particular filesystem has three attribute layouts that have been used by dnodes, and as you can see they are mostly the same. Layout 3 is the common subset, with all of the basic inode attributes you'd expect in a Unix filesystem; layout 2 adds attribute 21 (ZPL_PROJID), and layout 4 adds attribute 17 (ZPL_SYMLINK).
It's possible to have a lot more layouts than this. Here is the collection of layouts for my home desktop's home directory filesystem (which uses the same registered attribute numbers as the filesystem above, so you can look up there for them):
4 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 9 ] 3 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 17 ] 7 = [ 5 6 4 12 13 7 11 0 1 2 3 8 21 16 19 9 ] 2 = [ 5 6 4 12 13 7 11 0 1 2 3 8 16 19 ] 5 = [ 5 6 4 12 13 7 11 0 1 2 3 8 10 16 19 ] 6 = [ 5 6 4 12 13 7 11 0 1 2 3 8 21 16 19 ]
Incidentally, notice how these layout numbers aren't the same as the layout numbers on the first filesystem; layout 3 on the first filesystem is layout 2 on my home directory filesystem, layout 4 (symlinks) is layout 3, and layout 2 (project ID) is layout 6. The additional layouts in my home directory filesystem add xattrs (id 9) or 'rdev' (id 10) to some combination of the other attributes.
One of the interesting aspects of this is that you can use the SA attribute layouts to tell if a ZFS filesystem definitely doesn't have some sort of files in it. For example, we know that there are no device special files or files with xattrs in /w/430, because there are no SA attribute layouts that include those attributes. And neither of these two filesystems have ever had ACLs set on any of their files, because neither of them have layouts with either SA ACL attributes.
(Attribute layouts are never removed once created, so a filesystem with a layout with the 'rdev' attribute in it may still not have any device special files in it right now; they could all have been removed.)
Unfortunately, I can't see any obvious way to get zdb to tell you what the current attribute layout is for a specific dnode. At best you have to try to deduce it from what 'zdb -dddd' will print for the dnode's attributes.
(I've recently acquired a reason to dig into the details of ZFS system attributes.)
Sidebar: A brief digression on xattrs in ZFS
As covered in zfsprops(7)'s section on 'xattr=',
there are two storage schemes for xattrs in ZFS (well, in OpenZFS
on Linux and FreeBSD). At the attribute level, 'ZPL_XATTR
' is
the older, more general 'store it in directories and files' approach,
while 'ZPL_DXATTR
' is the 'store it as part of system attributes'
one ('xattr=sa'). When dumping a dnode in zdb, zdb will directly
print SA xattrs, but for directory xattrs it simply reports
'xattr = <object id>', where the object ID is for the xattr directory.
To see the names of the xattrs set on such a file, you need to also
dump the xattr directory object with zdb.
(Internally the SA xattrs are stored as a nvlist, because ZFS loves nvlists and nvpairs, more or less because Solaris did at the time.)