How I have partitioning et al set up for ZFS On Linux

December 29, 2014

This is a deeper look into how I have my office workstation configured with ZFS On Linux for all of my user data, because I figure that this may be of interest for people.

My office workstation's primary disks are a pair of 1 TB SATA drives. Each drive is partitioned identically into five partitions. The first four of those partitions (well, pairs of those partitions) are used for software RAID mirrors for swap, /, /boot, and a backup copy of / that I use when I do yum upgrades from one version of Fedora to another. If I was redoing this partitioning today I would not use a separate /boot partition, but this partitioning predates my enlightenment on that.

(Actually because I'm using GPT partitioning there are a few more partitions sitting around for UEFI stuff; I have extra 'EFI System' and and 'BIOS boot partition' partitions. I've ignored them as long as this system has been set up.)

All together these partitions use up about 170 GB of the disks (mostly in the two root filesystem partitions). The rest of the disk is in the final large partition, and this partition (on both disks) is what ZFS uses for the maindata pool that holds all my filesystems. The pool is of course set up with a single mirror vdev that uses both partitions. Following more or less the ZoL recommendations that I found, I set it up using the /dev/disk/by-id/ 'wnn-....-part7' names for the two partitions in question (and I set it up with an explicit 'ashift=12' option as future-proofing, although these disks are not 4K disks themselves).

I later added a SSD as a L2ARC because we had a spare SSD lying around and I had the spare chassis space. Because I had nothing else on the SSD at all, I added it with the bare /dev/disk/by-id wnn-* name and let ZoL partition the disk itself (and I didn't attempt to force an ashift for the L2ARC). As I believe is standard ZoL behavior, ZoL partitioned it as a GPT disk with an 8 MB spacer partition at the end. ZoL set the GPT partition type to 'zfs' (BF01).

(ZoL doesn't seem to require specific GPT partition types if you give it explicit partitions; my maindata partitions are still labeled as 'Linux LVM'.)

Basically all of the filesystems from my maindata pool are set up with explicit mountpoint= settings that put them where the past LVM versions went; for most of them this is various names in / (eg /homes, /data, /vmware, and /archive). I have ZoL set up to mount these in the normal ZFS way, ie as the ZFS pools and services are brought up (instead of attempting to do something through /etc/fstab). I also have a collection of bind mounts that also materialize bits of these filesystems in other places, mostly because I'm a bit crazy. Since all of the mount points and bind targets are in the root filesystem, I don't have to worry about mount order dependencies; if the system is up enough to be bringing up ZFS, the root filesystem is there to be mounted on.

Sidebar: on using wnn-* names here

I normally prefer physical location based names like /dev/sda and so on. However ZoL people recommend using stable /dev/disk/by-* names in general and they prefer the by-id names that are tied to the physical disk instead of what's plugged in where. When I was setting up ZFS I decided that this was okay by me because, after all, the ZFS pool itself is tied to these specific disks unless I go crazy and do something like dd the ZoL data from one disk to another. Really, it's no different from Linux's software RAID automatically finding its component disks regardless of what they're called today and I'm perfectly fine with that.

(And tying ZoL to specific disks will save me if someday I'm shuffling SATA cables around and what was /dev/sdb accidentally winds up as /dev/sdd.)

Of course I'd be happier if ZoL would just go look at the disks and find the ZFS metadata and assemble things automatically the way that Linux software RAID does. But ZoL has apparently kept most of the irritating Solaris bits around zpool.cache and how the system does relatively crazy things while bringing pools up on boot. I can't really blame them for not wanting to rewrite all of that code and make things more or less gratuitously different from the Illumos ZFS codebase.

Written on 29 December 2014.
« How I think DNSSec will have to be used in the real world
Somewhat to my surprise, classical viruses by email are still a thing »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 29 01:45:32 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.