2020-02-02
What we do to enable us to grow our ZFS pools over time
In my entry on why ZFS isn't good at growing and reshaping pools, I mentioned that we go to quite some lengths in our ZFS environment to be able to incrementally expand our pools. Today I want to put together all of the pieces of that in one place to discuss what those lengths are.
Our big constraint is that not only do we need to add space to pools over time, but we have a fairly large number of pools and which pools will have space added to them is unpredictable. We need a solution to pool expansion that leaves us with as much flexibility as possible for as long as possible. This pretty much requires being able to expand pools in relatively small increments of space.
The first thing we do, or rather don't do, is that we don't use raidz. Raidz is potentially attractive on SSDs (where the raidz read issue has much less impact), but since you can't expand a raidz vdev, the minimum expansion for a pool using raidz vdevs is at least three or four separate 'disks' to make a new raidz vdev (and in practice you'd normally want to use more than that to reduce the raidz overhead, because a four disk raidz2 vdev is basically a pair of mirrors with slightly more redundancy but more awkward management and some overheads). This requires adding relatively large blocks of space at once, which isn't feasible for us. So we have to do ZFS mirroring instead of the more space efficient raidz.
(A raidz2 vdev is also potentially more resilient than a bunch of mirror vdevs, because you can lose any arbitrary two disks without losing the pool.)
However, plain mirroring of whole disks would still not work for us because that would mean growing pools by relatively large amounts of space at a time (and strongly limit how many pools we can put on a single fileserver). To enable growing pools by smaller increments of space than a whole disk, we partition all of our disks into smaller chunks, currently four chunks on a 2 TB disk, and then do ZFS mirror vdevs using chunks instead of whole disks. This is not how you're normally supposed to set up ZFS pools, and on our older fileservers using HDs over iSCSI it caused visible performance problems if a pool ever used two chunks from the same physical disk. Fortunately those seem to be gone on our new SSD-based fileservers.
Even with all of this we can't necessarily let people expand existing pools by a lot of space, because the fileserver their pool is on may not have enough free space left (especially if we want other pools on that fileserver to still be able to expand). When people buy enough space at once, we generally wind up starting another ZFS pool on a different fileserver, which somewhat cuts against the space flexibility that ZFS offers. People may not have to decide up front how much space they want their filesystems to have, but they may have to figure out which pool a new filesystem should go into and then balance usage across all of their pools (or have us move filesystems).
(Another thing we do is that we sell pool space to people in 1 GB increments, although usually they buy more at once. This is implemented using a pool quota, and of course that means that we don't even necessarily have to grow the pool's space when people buy space; we can just increase the quota.)
Although we can grow pools relatively readily (when we need to), we still have the issue that adding a new vdev to a ZFS pool doesn't rebalance space usage across all of the pool's vdevs; it just mostly writes new data to the new vdev. In a SSD world where seeks are essentially free and we're unlikely to saturate the SSD's data transfer rates on any regular basis, this imbalance probably doesn't matter too much. It does make me wonder if nearly full pool vdevs interact badly with ZFS's issues with coming near quota limits (and a followup).
Some unusual and puzzling bad requests for my CSS stylesheet
Anyone who has looked at their web server's error logs knows that there's some weird stuff out there (as well as the straightforward bad stuff). Looking at the error logs for Wandering Thoughts recently turned up some people who apparently have unusual ideas of how to parse HTML to determine the URL for my CSS stylesheet.
Wandering Thoughts has what I think of as a standard <link> element in its <head> for my CSS stylesheet:
<link href="/~cks/dwiki/dwiki.css" rel="stylesheet" type="text/css">
Pretty much every browser in existence will parse this and request my CSS. What I saw in the error logs was this:
File does not exist: <path>/dwiki.css" rel="stylesheet" type="text
This certainly looks like the clients making this request took the entire contents of the <link> from the first quote to the very last one and decided it was the actual URL.
There's nothing particularly bad about what the sources of these badly parsed requests seem to have been doing; they seem to be reading entries here, at reasonable volumes (sometimes only one entry). Some but not all of them request the web server's favicon.
I'm puzzled about what the underlying source of these requests could be. I'm pretty sure that it's common to have CSS <link>s with the href to the CSS stylesheet not as the last (quoted) attribute, so any browser or browser-like thing that mis-parsed <link>s this way wouldn't work on a wide variety of sites, not just me. The obvious suspicion is that whatever is making the request doesn't actually care about the CSS and doesn't use it, making the bad parsing and subsequent request failure unimportant, but as mentioned the IPs making these requests don't show any signs of being up to anything bad.
The good news (if these are real people with real browsers) is that WanderingThoughts mostly doesn't depend on its CSS stylesheet. A completely unstyled version looks almost the same as the usual one (which is also good for people reading entries in a syndication feed reader).
Sidebar: A little more detail on the sources
I saw these requests from several IPs (although at different activity levels); at least one of the IPs was a residential cablemodem IP. They had several different user-agents, including at least:
Mozilla/5.0 (Linux; Android 8.1.0; Mi A2 Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91 Mobile Safari/537.36
Mozilla/5.0 (Linux; Android 9; SM-N960F Build/PPR1.180610.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/76.0.3809.132 Mobile Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/61.0.3163.98 Safari/537.36
(Of course all of this could be coming from a real browser that just cloaks its user-agent string to foil fingerprinting and tracking.)
I haven't tried to trawl my server logs to see if these particular user agents show up somewhere else.