2019-03-31
Our likely ZFS fileserver upgrade plans (as of March 2019)
Our third generation of ZFS fileservers are now in full production, although we're less than half way through migrating all of our filesystems from our second generation fileservers. As peculiar as it sounds, this makes me think ahead to what our likely upgrade plans are.
Our current generation ZFS fileservers are running Ubuntu 18.04 LTS with the Ubuntu version of ZFS (with a frozen kernel version). Given our past habits, it's unlikely that we'll want to upgrade them to Ubuntu 20.04 LTS when that comes out in a year or so, unless there's some important ZFS bugfix or feature that's present in 20.04 (which is possible, cf, although serious bugs will hopefully be fixed in the 18.04 version of ZFS). Instead, we'll only start looking at upgrades when 18.04 goes on its end of life countdown when Ubuntu 22.04 LTS comes out, which historically will be in April of 2022, three years from now.
In 2022, our current server hardware and 2TB data SSDs will be about four years old; based on our past habits, this will not be old enough that we consider them in urgent need of replacement. I hope that we'll turn over the SSDs for new ones with larger capacity (and without four years of write wear), but we might not do it in 2022 at the same time as we execute an upgrade to 22.04. If we have money, we might refresh the servers with new hardware, but if so I think we'd mostly be doing it to have hardware that hadn't been used for four years, instead of more powerful hardware, and in general our SuperMicro servers have been very reliable; our OmniOS generation are now somewhere around five years old and show no signs of problems anywhere. The one exception is that maybe RAM prices will finally have gone down substantially by 2022 so we can afford to put a lot more memory in a new generation of servers.
(We will definitely be upgrading from Ubuntu 18.04 when it starts going out of support, and it's probable that it will be to the current Ubuntu LTS instead of to, say, CentOS. Hardware upgrades are much more uncertain.)
Frankly, next time around I would like us not to have to move our ZFS pools and filesystems over to new fileservers; it takes a lot of work and a lot of time. An 'in place' upgrade for the ZFS pools is now at least possible and I hope that we do it, either by reusing the current servers and swapping in new system disks set up with Ubuntu 22.04, or by moving the data SSDs from one physical server to another and then re-importing the pools and so on.
(We did a 'swap the system disks' upgrade on our OmniOS fileservers when we moved from r151010 to r151014 and it went okay. It turns out that we also did this for a Solaris 10 upgrade many years ago.)
Erasing SSDs with blkdiscard
(on Linux)
Our approach to upgrading servers by reinstalling them from scratch on new hardware means that we have a slow flow of previously used servers that we're going to reuse, and thus that need their disks cleaned up from their previous life. Some places would do this for data security reasons, but here we mostly care that lingering partitioning, software RAID superblocks, and so on don't cause us problems on new OS installs.
In the old days of HDs, we generally did this by zeroing out the
old drives with dd
(on a machine dedicated to the purpose which
was just left running in the corner, since this takes some time
with HDs), or sometimes with a full badblocks
scan. When
we started using SSDs in our servers, this didn't seem like such a
good idea any more. We didn't really want to use up some of the
SSD write endurance just to blank them out or worse, to write over
them repeatedly with badblocks
.
Our current solution to this is blkdiscard
, which
basically sends a TRIM
command to the SSD.
Conveniently, the Ubuntu 18.04 server CD image that we use as the
base for our install images contains blkdiscard
, so we can boot
a decommissioned server from install media, wait for the Ubuntu
installer to initialize and find all the disks, and then switch
over to a text console to blkdiscard
its SSDs. In the process
of doing this a few times, I have developed a process and learned
some useful lessons.
First, just to be sure and in an excess of caution, I usually
explicitly zero the very start of each disk with 'dd if=/dev/zero
of=/dev/sdX bs=1024k count=128; sync
' (the count can vary). This
at least zeroes out the MBR partition no matter what. Then when I
use blkdiscard
, I generally background it because I've found that
it can take a while to finish and I may have more than one disk to
blank out:
# blkdiscard /dev/sda & # blkdiscard /dev/sdb & # wait
I could do them one at a time, but precisely because it can take a while I usually wander away from the server to do other things. This gets everything done all at once, so I don't have to wait twice.
Finally, after I've run blkdiscard
and it's finished, I usually let
the server sit there running for a while. This is probably superstition,
but I feel like giving the SSDs time to process the TRIM
operation
before either resetting them with a system reboot or powering the server
off (with a 'poweroff
', which is theoretically orderly). If I had a
bunch of SSDs to work through this would be annoying, but usually we're
only recycling one server at a time.
I don't know if SSDs commonly implement TRIM
to return zero sectors
for the TRIM'd space, but for our purposes it's sufficient if they're
random garbage that won't be recognized as anything meaningful. And I
think that SSDs do do that, at least so far, and that we can probably
count on them to do it.
(SSDs might be smart enough to recognize blocks of zeros and turn them
into TRIM
, but why take chances and if nothing else, blkdiscard
is
easier and faster, even with the waiting afterward.)