How ZFS maintains file type information in directories
As an aside in yesterday's history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn't. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.
(One reason to do this in a filesystem is future-proofing it against a day when your Unix might decide to support this in general; another is if you ever might want the filesystem to be a first class filesystem in another Unix that does support this stuff. In ZFS's case, I suspect that the first motivation was larger than the second one.)
The easiest way to see that ZFS does this is to use zdb
to dump
a directory. I'm going to do this on an OmniOS machine, to make it
more convincing, and it turns out that this has some interesting
results. Since this is OmniOS, we don't have the convenience of
just naming a directory in zdb
, so let's find the root directory
of a filesystem, starting from dnode 1 (as seen before).
# zdb -dddd fs3-corestaff-01/h/281 1 Dataset [....] [...] microzap: 512 bytes, 4 entries [...] ROOT = 3 # zdb -dddd fs3-corestaff-01/h/281 3 Object lvl iblk dblk dsize lsize %full type 3 1 16K 1K 8K 1K 100.00 ZFS directory [...] microzap: 1024 bytes, 8 entries RESTORED = 4396504 (type: Directory) ckstst = 12017 (type: not specified) ckstst3 = 25069 (type: Directory) .demo-file = 5832188 (type: Regular File) .peergroup = 12590 (type: not specified) cks = 5 (type: not specified) cksimap1 = 5247832 (type: Directory) .diskuse = 12016 (type: not specified) ckstst2 = 12535 (type: not specified)
This is actually an old filesystem (it dates from Solaris 10 and
has been transferred around with 'zfs send | zfs recv
' since then),
but various home directories for real and test users have been
created in it over time (you can probably guess which one is the
oldest one). Sufficiently old directories and files have no file
type information, but more recent ones have this information,
including .demo-file
, which I made just now so this would have
an entry that was a regular file with type information.
Once I dug into it, this turned out to be a change introduced (or
activated) in ZFS filesystem version 2, which is described in 'zfs
upgrade -v
' as 'enhanced directory entries'. As an actual change
in (Open)Solaris, it dates from mid 2007, although I'm not sure
what Solaris release it made it into. The upshot is that if you
made your ZFS filesystem any time in the last decade, you'll have
this file type information in your directories.
How ZFS stores this file type information is interesting and clever,
especially when it comes to backwards compatibility. I'll start by
quoting the comment from zfs_znode.h
:
/* * The directory entry has the type (currently unused on * Solaris) in the top 4 bits, and the object number in * the low 48 bits. The "middle" 12 bits are unused. */
In yesterday's entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the 'inode' (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean 'not specified'. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.
(It seems common to define DT_UNKNOWN
to be 0; both FreeBSD
and Linux do it.)
The reason this needed a new ZFS filesystem version is now clear. If you tried to read directory entries with file type information on a version of ZFS that didn't know about them, the old version would likely see crazy (and non-existent) object numbers and nothing would work. In order to even read a 'file type in directory entries' filesystem, you need to know to only look at the low 48 bits of the object number field in directory entries.
(As before, I consider this a neat hack that cleverly uses some properties of ZFS and the filesystem to its advantage.)
|
|