Wandering Thoughts archives


Some things on how ZFS dnode object IDs are allocated (which is not sequentially)

One of the core elements of ZFS are dnodes, which define DMU objects. Within a single filesystem or other object sets, dnodes have an object number (aka object id). For dnodes that are files or directories in a filesystem, this is visible as their Unix inode number, but other internal things get dnodes and thus object numbers (for example, the dnode of the filesystem's delete queue). Object ids are 64-bit numbers, and many of them can be relatively small (especially if they are object ids for internal structures, again such as the delete queue). Very large dnode numbers are uncommon, and some files and directories from early in a filesystem's life can have very small object IDs.

(For instance, the object ID of my home directory on our ZFS fileservers is '5'. I'm the only user in this filesystem.)

You might reasonably wonder how ZFS object IDs are allocated. Inspection of a ZFS filesystem will show that they are clearly not allocated sequentially, but they're also not allocated randomly. Based on an inspection of the dnode allocation source code in dmu_object.c, there seem to be two things going on to spread dnode object ids around some (but not too much).

The first thing is that dnode allocation is done from per-CPU chunks of the dnode space. The size of each chunk is set by dmu_object_alloc_chunk_shift, which by default creates 128-dnode chunks. The motivation for this is straightforward; if all of the CPUs in the system were all allocating dnodes from the same area, they would all have to content over locks on this area. Spreading out into separate chunks reduces locking contention, which means that parallel or highly parallel workloads that frequently create files on a single filesystem don't bottleneck on a shared lock.

(One reason that you might create files a lot in a parallel worklog is if you're using files on the filesystem as part of a locking strategy. This is still common in things like mail servers, mail clients, and IMAP servers.)

The second thing is, well, I'm going to quote the comment in the source code to start with:

Each time we polish off a L1 bp worth of dnodes (2^12 objects), move to another L1 bp that's still reasonably sparse (at most 1/4 full). Look from the beginning at most once per txg. If we still can't allocate from that L1 block, search for an empty L0 block, which will quickly skip to the end of the metadnode if no nearby L0 blocks are empty. This fallback avoids a pathology where full dnode blocks containing large dnodes appear sparse because they have a low blk_fill, leading to many failed allocation attempts. [...]

(In reading the code a bit, I think this comment means 'L2 block' instead of 'L0 block'.)

To understand a bit more about this, we need to know about two things. First, we need to know that dnodes themselves are stored in another DMU object, and this DMU object stores data in the same way as all others do, using various levels of indirect blocks. Then we need to know about indirect blocks themselves. L0 blocks directly hold data (in this case the actual dnodes), while L1 blocks hold pointers to L0 blocks and L2 blocks hold pointers to L1 blocks.

(You can see examples of this structure for regular files in the zdb output in this entry and this entry. If I'm doing the math right, for dnodes a L0 block normally holds 32 dnodes and a L<N> block can address up to 128 L<N-1> blocks, through block pointers.)

So, what appears to happen is that at first, the per-CPU allocator gets its chunks sequentially (for different CPUs, or the same CPU) from the same L1 indirect block, which covers 4096 dnodes. When we exhaust all of the 128-dnode chunks in a single group of 4096, we don't move to the sequentially next group of 4096; instead we search around for a sufficiently empty group, and switch to it (where a 'sufficiently empty' group is one with at most 1024 dnodes already allocated). If there is no such group, I think that we may wind up skipping to the end of the currently allocated dnodes and getting a completely fresh empty block of 4096.

If I'm right, the net effect of this is to smear out dnode allocations and especially reallocations over an increasingly large portion of the lower dnode object number space. As your filesystem gets used and files get deleted, many of the lower 4096-dnode groups will have some or even many free dnodes, but not the 3072 that they need to be eligible for be selected for further assignment. This can eventually push dnode allocations to relatively high object numbers even though you may not have anywhere near that many dnodes in use on the filesystem. This is not guaranteed, though, and you may still reuse dnode numbers.

(For example, I just created a new file in my home directory. My home directory's filesystem has 1983310 dnodes used right now, but the inode number (and thus dnode object number) that my new test file got was 1804696.)

solaris/ZFSDnodeIdsAllocation written at 23:25:17; Add Comment

Some things about where icons for modern X applications come from

If you have a traditional window manager like fvwm, one of the things it can do is iconify X windows so that they turn into icons on the root window (which would often be called the 'desktop'). Even modern desktop environments that don't iconify programs to the root window (or their desktop) may have per-program icons for running programs in their dock or taskbar. If your window manager or desktop environment can do this, you might reasonably wonder where those icons come from by default.

Although I don't know how it was done in the early days of X, the modern standard for this is part of the Extended Window Manager Hints. In EWMH, applications give the window manager a number of possible icons, generally in different sizes, as ARGB bitmaps (instead of, say, SVG format). The window manager or desktop environment can then pick whichever icon size it likes best, taking into account things like the display resolution and so on, and display it however it wants to (in its original size or scaled up or down).

How this is communicated in specific is through the only good interprocess communication method that X supplies, namely X properties. In the specific case of icons, the _NET_WM_ICON property is what is used, and xprop can display the size information and an ASCII art summary of what each icon looks like. It's also possible to use some additional magic to read out the raw data from _NET_WM_ICON in a useful format; see, for example, this Stackoverflow question and its answers.

(One reason to extract all of the different icon sizes for a program is if you want to force your window manager to use a different size of icon than it defaults to. Another is if you want to reuse the icon for another program, again often through window manager settings.)

X programs themselves have to get the data that they put into _NET_WM_ICON from somewhere. Some programs may have explicit PNGs (or whatever) on the filesystem that they read when they start (and thus that you can too), but others often build this into their program binary or compiled data files, which means that you have to go to the source code to pull the files out (and they may not be in a bitmap format like PNG; there are probably programs that start with a SVG and then render it to various sized PNGs).

(As a concrete example, as far as I know Firefox's official icons are in the 'defaultNN.png' files in browser/branding/official. Actual builds may not use all of the sizes available, or at least not put them into _NET_WM_ICON; on Fedora 29, for example, the official Fedora Firefox 66 only offers up to 32x32, which is tragically small on my HiDPI display.)

None of this is necessarily how a modern integrated desktop like Gnome or KDE handles icons for their own programs. There are probably toolkit-specific protocols involved, and I suspect that there is more support and encouragement for SVG icons than there is in EWMH (where there is none).

PS: All of this is going to change drastically in Wayland, since we obviously won't have X properties any more.

(This whole exploration was prompted by a recent question on the FVWM mailing list.)

unix/ModernXAppIcons written at 00:50:28; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.