Wandering Thoughts archives

2019-12-20

My new Linux office workstation disk partitioning for the end of 2019

I've just had the rare opportunity to replace all of my office machine's disks at once, without having to carry over any of the previous generation the way I've usually had to. As part of replacing everything I got the chance to redo the partitioning and setup of all of my disks, again all at once without the need to integrate a mix of the future and the past. For various reasons, I want to write down the partitioning and filesystem setup I decided on.

My office machine's new set of disks are a pair of 500 GB NVMe drives and a pair of 2 TB SATA SSDs. I'm using GPT partitioning on all four drives for various reasons. All four drives start with my standard two little partitions, a 256 MB EFI System Partition (ESP, gdisk code EF00) and a 1 MB BIOS boot partition (gdisk code EF02). I don't currently use either of them (my past attempt to switch from MBR booting to UEFI was a failure), but they're cheap insurance for a future. Similarly, putting these partitions on all four drives instead of just my 'system' drives is more cheap insurance.

(Writing this down has made me realize that I didn't format the ESPs. Although I don't use UEFI for booting, I have in the past put updated BIOS firmware images there in order to update the BIOS.)

The two NVMe are my 'system' drives. They have three additional partitions; a 70 GB partition used for a Linux software RAID mirror of the root filesystem (including /usr and /var, since I put all of the system into one filesystem), a 1 GB partition that is a Linux software RAID mirror swap partition, and the remaining 394.5 GB as a mirrored ZFS pool that holds filesystems that I want to be as fast as possible and that I can be confident won't grow to be too large. Right now that's my home directory filesystem and the filesystem that holds source code (where I build Firefox, Go, and ZFS on Linux, for example).

The two SATA SSDs are my 'data' drives, holding various larger but less important things. They have two 70 GB partitions that are Linux software RAID mirrors and the remaining space is in in a single partition for another mirrored ZFS pool. One of the two 70 GB partitions is so that I can make backup copies of my root filesystem before upgrading Fedora (if I bother to do so); the other is essentially an 'overflow' filesystem for some data that I want on an ext4 filesystem instead of in a ZFS pool (including a backup copy of all recent versions of ZFS on Linux that I've installed on my machine, so that if I update and the very latest version has a problem, I can immediately reinstall a previous one). The ZFS pool on the SSDs contains larger and generally less important things like my VMWare virtual machine images and the ISOs I use to install them, and archived data.

Both ZFS pools are set up following my historical ZFS on Linux practice, where they use the /dev/disk/by-id names for my disks instead of the sdX and nvme... names. Both pools are actually relatively old; I didn't create new pools for this and migrate my data, but instead just attached new mirrors to the old pools and then detached the old drives (more or less). The root filesystem was similarly migrated from my old SSDs by attaching and removing software RAID mirrors; the other Linux software RAID filesystems are newly made and copied through ext4 dump and restore (and the new software RAID arrays were added to /etc/mdadm.conf more or less by hand).

(Since I just looked it up, the ZFS pool on the SATA SSDs was created in August of 2014, originally on HDs, and the pool on the NVMe drives was created in January of 2016, originally on my first pair of (smaller) SSDs.)

Following my old guide to RAID superblock formats, I continued to use the version 1.0 format for everything except the new swap partition, where I used the version 1.2 format. By this point using 1.0 is probably superstition; if I have serious problems (for example), I'm likely to just boot from a Fedora USB live image instead of trying anything more complicated.

All of this feels very straightforward and predictable by now. I've moved away from complex partitioning schemes over time and almost all of the complexity left is simply that I have two different sets of disks with different characteristics, and I want some filesystems to be fast more than others. I would like all of my filesystems to be on NVMe drives, but I'm not likely to have NVMe drives that big for years to come.

(The most tangled bit is the 70 GB software RAID array reserved for a backup copy of my root filesystem during major upgrades, but in practice it's been quite a while since I bothered to use it. Still, having it available is cheap insurance in case I decide I want to do that someday during an especially risky Fedora upgrade.)

linux/WorkMachinePartitioning2019 written at 23:52:22; Add Comment

Splitting a mirrored ZFS pool in ZFS on Linux

Suppose, not hypothetically, that you're replacing a pair of old disks with a pair of new disks in a ZFS pool that uses mirrors. If you're a cautious person and you worry about issues like infant mortality in your new drives, you don't necessarily want to immediately switch from the old disks to the new ones; you want to run them in parallel for at least a bit of time. ZFS makes this very easy, since it supports up to four way mirrors and you can just attach devices to add extra mirrors (and then detach devices later). Eventually it will come time to stop using the old disks, and at this point you have a choice of what to do.

The straightforward thing is to drop the old disks out of the ZFS mirror vdev with 'zpool detach', which cleanly removes them (and they won't come back later, unlike with Linux software RAID). However this is a little bit wasteful, in a sense. Those old disks have a perfectly good backup copy of your ZFS pool on them, but when you detach them you lose any real possibility of using that copy. Perhaps you would like to keep that data as an actual backup copy, just in case. Modern versions of ZFS can do this through splitting the pool with 'zpool split'.

To quote the manpage here:

Splits devices off pool creating newpool. All vdevs in pool must be mirrors and the pool must not be in the process of resilvering. At the time of the split, newpool will be a replica of pool. [...]

In theory the manpage's description suggests that you can split a four-way mirror vdev in half, pulling off two devices at once in a 'zpool split' operation. In practice it appears that the current 0.8.x version of ZFS on Linux can only split off a single device from each mirror vdev. This meant that I needed to split my pool in a multi-step operation.

Let's start with a pool, maindata, with four disks in a single mirrored vdev, oldA, oldB, newC, and newD. We want to split maindata so that there is a new pool with oldA and oldB. First, we split one old device out of the pool:

zpool split -R /mnt maindata maindata-hds oldA

Normally the just split off newpool is not imported (as far as I know), and certainly you don't want it imported if your filesystems have explicit 'mountpoint' settings (because then filesystems from the original and the split off pool will fight over who gets to be mounted there). However, you can't add devices to exported pools and we need to add oldB, so we have to import the new pool in an altroot. I use /mnt here out of tradition but you can use any convenient empty directory.

With the pool split off, we need to detach oldB from the regular pool and attach it to oldA in the new pool to make the new pool actually be mirrored:

zpool detach maindata oldB
zpool attach maindata-hds oldA oldB

This will then resilver the maindata-hds new pool on to oldB (even though oldB has an almost exact copy already). Once the resilver is done, you can export the pool:

zpool export maindata-hds

You now have your mirrored backup copy sitting around with relatively little work on your part.

All of this appears to have worked completely fine for me. I scrubbed my maindata pool before splitting it, just in case, but I don't think I bothered to scrub the maindata-hds new pool after the resilver. It's only an emergency backup pool anyway (and it gets less and less useful over time, since there are more divergences between it and the live pool).

PS: I don't know if you can make snapshots, split a pool, and then do incremental ZFS sends from filesystems in one copy of the pool to the other to keep your backup copy more or less up to date. I wouldn't be surprised if it worked, but I also wouldn't be surprised if it didn't.

linux/ZFSSplitPoolExperience written at 00:33:45; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.