2019-04-01
Our plan for handling TRIM'ing our ZFS fileserver SSDs
The versions of ZFS that we're running on our fileservers (both
the old and the new) don't support using TRIM
commands
on drives in ZFS pools. Support for TRIM
has been in FreeBSD ZFS
for a while,
but it only just landed in the ZFS on Linux development version
and it's not in Illumos. Given our general upgrade plans, we're also not likely to
get TRIM
support over the likely production lifetime of our current
ZFS SSDs through upgrading the OS and ZFS versions later. So you
might wonder what our plans are to deal with how SSD performance
can decrease when they think they're all filled up, if you don't
TRIM
them or otherwise deallocate blocks every so often.
Honestly, the first part of our plan is to ignore the issue unless we see signs of performance problems. This is not ideal but it is the simplest approach. It's reasonably likely that our ZFS fileservers will be more limited by NFS and networking than by SSD performance, and as far as I understand things, nominally full SSDs mostly suffer from write performance issues, not read performance. Our current view (only somewhat informed by actual data) is that our read volume is significantly higher than our write volume. We certainly aren't currently planning any sort of routine preventative work here, and we wouldn't unless we saw problem signs.
If we do see problems signs and do need to clear SSDs, our plan is
to do the obvious brute force thing in a ZFS setup with redundancy.
Rather than try to TRIM
SSDs in place, we'll entirely spare out
a given SSD so that it has no live data on it, and then completely
clear it, probably using Linux's blkdiscard
. We might do this in place on
a production fileserver, or we might go to the extra precaution of
pulling the SSD out entirely, swapping in a freshly cleared one,
and clearing the old SSD on a separate machine. Doing this swap has
the twin advantages that we're not risking accidentally clearing
the wrong SSD on the fileserver and we don't have to worry about
the effects of an extra-long, extra-slow SATA command on the rest
of the system and the other drives.
(This plan, such as it is, is not really new with our current generation Linux fileservers. We've had one OmniOS fileserver that used SSDs for a few special pools, and this was always our plan for dealing with any clear problems due to the SSDs slowing down due to being full up. We haven't had to use it, but then we haven't really gone looking for performance problems with its SSDs. They seem to still run fast enough after four or more years, and so far that's good enough for us.)