2018-02-04
A surprise in how ZFS grows a file's record size (at least for me)
As I wound up experimentally verifying,
in ZFS all files are stored as a single block of varying size up
to the filesystem's recordsize
, or using multiple recordsize
blocks. If a file has more than one block, all blocks are recordsize
,
no more and no less. If a file is a single block, the size of this
block is based on how much data has been written to the file (or
technically the maximum offset that's been written to the file).
However,
how the block size grows as you write data to the file turns out
to be somewhat surprising (which makes me very glad that I actually
did some experiments to verify what I thought I knew before I wrote
this entry, because I was very wrong).
Rather than involving the ashift
or growing in powers of two,
ZFS always grows the (logical) block size in 512-byte chunks
until it reaches the filesystem recordsize
. The actual physical
space allocated on disk is in ashift
sized units, as you'd expect,
but this is not directly related to the (logical) block size used
at the file level. For example, here is a 16896 byte file (of
incompressible data) on an ashift=12
pool:
Object lvl iblk dblk dsize dnsize lsize %full type 4780566 1 128K 16.5K 20K 512 16.5K 100.00 ZFS plain file [...] 0 L0 DVA[0]=<0:444bbc000:5000> [L0 ZFS plain file] [...] size=4200L/4200P [...]
The DVA records an 0x5000 byte allocation (20 Kb), but the logical and physical-logical size are only 0x4200 bytes (16.5 Kb).
In thinking about it, this makes a certain amount of sense because
the ashift
is really a vdev property, not a pool property, and
can vary from vdev to vdev within a single pool. As a result, the
actual allocated size of a given block may vary from vdev to vdev
(and a block may be written to multiple vdevs if you have copies
set to more than 1 or it's metadata). The file's current block size
thus can't be based on the ashift
, because ZFS doesn't necessarily
have a single ashift
to base it on; instead ZFS bases it on 512-byte
sectors, even if this has to be materialized differently on different
vdevs.
Looking back, I've already sort of seen this with ZFS compression. As you'd expect, a file's (logical) block size is based on its uncompressed size, or more exactly on the highest byte offset in the file. You can write something to disk that compresses extremely well, and it will still have a large logical block size. Here's an extreme case:
; dd if=/dev/zero of=testfile bs=128k count=1 [...] # zdb -vv -bbbb -O ssddata/homes cks/tmp/testfile Object lvl iblk dblk dsize dnsize lsize %full type 956361 1 128K 128K 0 512 128K 0.00 ZFS plain file [...]
This turns out to have no data blocks allocated at all, because the 128 Kb of zeros can be recorded entirely in magic flags in the dnode. But it still has a 128 Kb logical block size. 128 Kb of the character 'a' does wind up requiring a DVA allocation, but the size difference is drastic:
Object lvl iblk dblk dsize dnsize lsize %full type 956029 1 128K 128K 1K 512 128K 100.00 ZFS plain file [...] 0 L0 DVA[0]=<0:3bbd1c00:400> [L0 ZFS plain file] [...] size=20000L/400P [...]
We have a compressed size of 1 Kb (and a 1 Kb allocation on disk,
as this is an ashift=9
vdev), but once again the file block size
is 128 Kb.
(If we wrote 127.5 Kb of 'a' instead, we'd wind up with a file block size of 127.5 Kb. I'll let interested parties do that experiment themselves.)
What this means is that ZFS has much less wasted space than I thought
it did for files that are under the recordsize
. Since such files
grow their logical block size in 512-byte chunks, even with no
compression they waste at most almost all of one physical block on
disk (if you have a file that is, say, 32 Kb plus one byte, you'll
have a physical block on disk with only one byte used). This has
some implications for other areas of ZFS, but those are for another
entry.
(This is one of those entries that I'm really glad that I decided to write. I set out to write it as a prequel to another entry just to have how ZFS grew the block size of files written down explicitly, but wound up upending my understanding of the whole area. The other lesson for me is that verifying my understanding with experiments is a really good idea, because every so often my folk understanding is drastically wrong.)
More notes on using uMatrix in Firefox 56 (in place of NoScript)
I wrote my first set of notes very early on in my usage of uMatrix, before things had really settled down and I mostly knew what I was doing it. Since then I've been refining my configuration and learning more about what works and how, and I've accumulated more stuff I want to record.
The first thing, the big thing, is that changing from NoScript to uMatrix definitely seems to have mostly solved my Firefox memory issues. My Firefox still slowly grows its memory usage over time, even with a stable set of windows, but it's doing so far less than it used to and as a result it's now basically what I consider stable. I certainly no longer have to restart it once every day or two. By itself this is a huge practical win and I'm far less low-key irritated with my Firefox setup.
(I'm not going to say that this memory growth was NoScript's fault, because it may well have been caused by some interaction between NS and my other extensions. It's also possible that my cookie blocker had something to do with it, since uMatrix also replaced it.)
It turns out that one hazard of using a browser for a long time is that you can actually forget how you have it configured. I had initial problems getting my uMatrix setup to accept cookies from some new sites I wanted to do this for (such as bugzilla.kernel.org). It turned out that I used to have Firefox's privacy settings set to refuse all cookies except ones from sites I'd specifically allowed. Naturally uMatrix itself letting cookies through wasn't doing anything when I'd told Firefox to refuse them in the first place. In the uMatrix world, I want to accept cookies in general and then let it manage them.
Well, more or less. uMatrix's approach is to accept all cookies but only let them be sent when you allow it. I decided I didn't entirely like having cookies hang around, so I've also added Self-Destructing Cookies to clean those cookies up later. SDC will also remove LocalStorage data, which I consider a positive since I definitely don't want random websites storing random amounts of things there.
(I initially felt grumpy about uMatrix's approach but have since come around to feeling that it's probably right for uMatrix, partly because of site-scoped rules. You may well have a situation where the same cookies are 'accepted' and sent out on some sites but blocked on others. uMatrix's approach isn't perfect here but it more or less allows this to happen.)
Another obvious in retrospect thing was YouTube videos embedded in other sites. Although you wouldn't know it without digging under the surface, these are in embedded iframes, so it's not enough to just allow YT's JavaScript on a site where you want them; you also need to give YT 'frame' permissions. I've chosen not to do this globally, because I kind of like just opening YT videos in another window using the link that uMatrix gives me.
I have had one annoying glitch in my home
Firefox with uMatrix, but once I dug deep enough it appears
that there's something unusual going on in my home Firefox 56. At
first I thought it was weird network issues with Google (which
I've seen before in this situation),
but now I'm not sure; in any case I get a consistent NS_ERROR_FAILURE
JavaScript failure deep in Google Groups' 'loaded on the fly' JS
code. This is un-debuggable and un-fixable by me, but at least I have
my usual option to fall back on.
('Things break mysteriously if you have an unusual configuration and even sometimes if you don't' is basically the modern web experience anyway.)
PS: A subtle benefit of using uMatrix is that it also exists for Chrome, so I can have the same interface and even use almost the same ruleset in my regular mode Chrome.
PPS: I'll have to replace Self-Destructing Cookies with something else when I someday move to Firefox Quantum, but as covered, I already have a candidate.