2018-08-26
A little bit of the one-time MacOS version still lingers in ZFS
Once upon a time, Apple came very close to releasing ZFS as part of MacOS. Apple did this work in its own copy of the ZFS source base (as far as I know), but the people in Sun knew about it and it turns out that even today there is one little lingering sign of this hoped-for and perhaps prepared-for ZFS port in the ZFS source code. Well, sort of, because it's not quite in code.
Lurking in the function that reads ZFS directories to turn (ZFS) directory entries into the filesystem independent format that the kernel wants is the following comment:
objnum = ZFS_DIRENT_OBJ(zap.za_first_integer); /* * MacOS X can extract the object type here such as: * uint8_t type = ZFS_DIRENT_TYPE(zap.za_first_integer); */
(Specifically, this is in zfs_readdir
in zfs_vnops.c .)
ZFS maintains file type information in directories. This information can't be used on Solaris
(and thus Illumos), where the overall kernel doesn't have this in
its filesystem independent directory entry format, but it could
have been on MacOS ('Darwin'), because MacOS is among the Unixes
that support d_type
. The comment
itself dates all the way back to this 2007 commit,
which includes the change 'reserve bits in directory entry for file
type', which created the whole setup for this.
I don't know if this file type support was added specifically to help out Apple's MacOS X port of ZFS, but it's certainly possible, and in 2007 it seems likely that this port was at least on the minds of ZFS developers. It's interesting but understandable that FreeBSD didn't seem to have influenced them in the same way, at least as far as comments in the source code go; this file type support is equally useful for FreeBSD, and the FreeBSD ZFS port dates to 2007 too (per this announcement).
Regardless of the exact reason that ZFS picked up maintaining file type information in directory entries, it's quite useful for people on both FreeBSD and Linux that it does so. File type information is useful for any number of things and ZFS filesystems can (and do) provide this information on those Unixes, which helps make ZFS feel like a truly first class filesystem, one that supports all of the expected general system features.
How ZFS maintains file type information in directories
As an aside in yesterday's history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn't. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.
(One reason to do this in a filesystem is future-proofing it against a day when your Unix might decide to support this in general; another is if you ever might want the filesystem to be a first class filesystem in another Unix that does support this stuff. In ZFS's case, I suspect that the first motivation was larger than the second one.)
The easiest way to see that ZFS does this is to use zdb
to dump
a directory. I'm going to do this on an OmniOS machine, to make it
more convincing, and it turns out that this has some interesting
results. Since this is OmniOS, we don't have the convenience of
just naming a directory in zdb
, so let's find the root directory
of a filesystem, starting from dnode 1 (as seen before).
# zdb -dddd fs3-corestaff-01/h/281 1 Dataset [....] [...] microzap: 512 bytes, 4 entries [...] ROOT = 3 # zdb -dddd fs3-corestaff-01/h/281 3 Object lvl iblk dblk dsize lsize %full type 3 1 16K 1K 8K 1K 100.00 ZFS directory [...] microzap: 1024 bytes, 8 entries RESTORED = 4396504 (type: Directory) ckstst = 12017 (type: not specified) ckstst3 = 25069 (type: Directory) .demo-file = 5832188 (type: Regular File) .peergroup = 12590 (type: not specified) cks = 5 (type: not specified) cksimap1 = 5247832 (type: Directory) .diskuse = 12016 (type: not specified) ckstst2 = 12535 (type: not specified)
This is actually an old filesystem (it dates from Solaris 10 and
has been transferred around with 'zfs send | zfs recv
' since then),
but various home directories for real and test users have been
created in it over time (you can probably guess which one is the
oldest one). Sufficiently old directories and files have no file
type information, but more recent ones have this information,
including .demo-file
, which I made just now so this would have
an entry that was a regular file with type information.
Once I dug into it, this turned out to be a change introduced (or
activated) in ZFS filesystem version 2, which is described in 'zfs
upgrade -v
' as 'enhanced directory entries'. As an actual change
in (Open)Solaris, it dates from mid 2007, although I'm not sure
what Solaris release it made it into. The upshot is that if you
made your ZFS filesystem any time in the last decade, you'll have
this file type information in your directories.
How ZFS stores this file type information is interesting and clever,
especially when it comes to backwards compatibility. I'll start by
quoting the comment from zfs_znode.h
:
/* * The directory entry has the type (currently unused on * Solaris) in the top 4 bits, and the object number in * the low 48 bits. The "middle" 12 bits are unused. */
In yesterday's entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the 'inode' (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean 'not specified'. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.
(It seems common to define DT_UNKNOWN
to be 0; both FreeBSD
and Linux do it.)
The reason this needed a new ZFS filesystem version is now clear. If you tried to read directory entries with file type information on a version of ZFS that didn't know about them, the old version would likely see crazy (and non-existent) object numbers and nothing would work. In order to even read a 'file type in directory entries' filesystem, you need to know to only look at the low 48 bits of the object number field in directory entries.
(As before, I consider this a neat hack that cleverly uses some properties of ZFS and the filesystem to its advantage.)