Wandering Thoughts archives


A little bit of the one-time MacOS version still lingers in ZFS

Once upon a time, Apple came very close to releasing ZFS as part of MacOS. Apple did this work in its own copy of the ZFS source base (as far as I know), but the people in Sun knew about it and it turns out that even today there is one little lingering sign of this hoped-for and perhaps prepared-for ZFS port in the ZFS source code. Well, sort of, because it's not quite in code.

Lurking in the function that reads ZFS directories to turn (ZFS) directory entries into the filesystem independent format that the kernel wants is the following comment:

 objnum = ZFS_DIRENT_OBJ(zap.za_first_integer);
  * MacOS X can extract the object type here such as:
  * uint8_t type = ZFS_DIRENT_TYPE(zap.za_first_integer);

(Specifically, this is in zfs_readdir in zfs_vnops.c .)

ZFS maintains file type information in directories. This information can't be used on Solaris (and thus Illumos), where the overall kernel doesn't have this in its filesystem independent directory entry format, but it could have been on MacOS ('Darwin'), because MacOS is among the Unixes that support d_type. The comment itself dates all the way back to this 2007 commit, which includes the change 'reserve bits in directory entry for file type', which created the whole setup for this.

I don't know if this file type support was added specifically to help out Apple's MacOS X port of ZFS, but it's certainly possible, and in 2007 it seems likely that this port was at least on the minds of ZFS developers. It's interesting but understandable that FreeBSD didn't seem to have influenced them in the same way, at least as far as comments in the source code go; this file type support is equally useful for FreeBSD, and the FreeBSD ZFS port dates to 2007 too (per this announcement).

Regardless of the exact reason that ZFS picked up maintaining file type information in directory entries, it's quite useful for people on both FreeBSD and Linux that it does so. File type information is useful for any number of things and ZFS filesystems can (and do) provide this information on those Unixes, which helps make ZFS feel like a truly first class filesystem, one that supports all of the expected general system features.

solaris/ZFSDTypeAndMacOS written at 21:24:29; Add Comment

How ZFS maintains file type information in directories

As an aside in yesterday's history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn't. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.

(One reason to do this in a filesystem is future-proofing it against a day when your Unix might decide to support this in general; another is if you ever might want the filesystem to be a first class filesystem in another Unix that does support this stuff. In ZFS's case, I suspect that the first motivation was larger than the second one.)

The easiest way to see that ZFS does this is to use zdb to dump a directory. I'm going to do this on an OmniOS machine, to make it more convincing, and it turns out that this has some interesting results. Since this is OmniOS, we don't have the convenience of just naming a directory in zdb, so let's find the root directory of a filesystem, starting from dnode 1 (as seen before).

# zdb -dddd fs3-corestaff-01/h/281 1
Dataset [....]
    microzap: 512 bytes, 4 entries
         ROOT = 3 

# zdb -dddd fs3-corestaff-01/h/281 3
    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        3    1    16K     1K     8K     1K  100.00  ZFS directory
    microzap: 1024 bytes, 8 entries

         RESTORED = 4396504 (type: Directory)
         ckstst = 12017 (type: not specified)
         ckstst3 = 25069 (type: Directory)
         .demo-file = 5832188 (type: Regular File)
         .peergroup = 12590 (type: not specified)
         cks = 5 (type: not specified)
         cksimap1 = 5247832 (type: Directory)
         .diskuse = 12016 (type: not specified)
         ckstst2 = 12535 (type: not specified)

This is actually an old filesystem (it dates from Solaris 10 and has been transferred around with 'zfs send | zfs recv' since then), but various home directories for real and test users have been created in it over time (you can probably guess which one is the oldest one). Sufficiently old directories and files have no file type information, but more recent ones have this information, including .demo-file, which I made just now so this would have an entry that was a regular file with type information.

Once I dug into it, this turned out to be a change introduced (or activated) in ZFS filesystem version 2, which is described in 'zfs upgrade -v' as 'enhanced directory entries'. As an actual change in (Open)Solaris, it dates from mid 2007, although I'm not sure what Solaris release it made it into. The upshot is that if you made your ZFS filesystem any time in the last decade, you'll have this file type information in your directories.

How ZFS stores this file type information is interesting and clever, especially when it comes to backwards compatibility. I'll start by quoting the comment from zfs_znode.h:

 * The directory entry has the type (currently unused on
 * Solaris) in the top 4 bits, and the object number in
 * the low 48 bits.  The "middle" 12 bits are unused.

In yesterday's entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the 'inode' (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean 'not specified'. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.

(It seems common to define DT_UNKNOWN to be 0; both FreeBSD and Linux do it.)

The reason this needed a new ZFS filesystem version is now clear. If you tried to read directory entries with file type information on a version of ZFS that didn't know about them, the old version would likely see crazy (and non-existent) object numbers and nothing would work. In order to even read a 'file type in directory entries' filesystem, you need to know to only look at the low 48 bits of the object number field in directory entries.

(As before, I consider this a neat hack that cleverly uses some properties of ZFS and the filesystem to its advantage.)

solaris/ZFSAndDirectoryDType written at 00:43:13; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.