Our plan for handling TRIM'ing our ZFS fileserver SSDs

April 1, 2019

The versions of ZFS that we're running on our fileservers (both the old and the new) don't support using TRIM commands on drives in ZFS pools. Support for TRIM has been in FreeBSD ZFS for a while, but it only just landed in the ZFS on Linux development version and it's not in Illumos. Given our general upgrade plans, we're also not likely to get TRIM support over the likely production lifetime of our current ZFS SSDs through upgrading the OS and ZFS versions later. So you might wonder what our plans are to deal with how SSD performance can decrease when they think they're all filled up, if you don't TRIM them or otherwise deallocate blocks every so often.

Honestly, the first part of our plan is to ignore the issue unless we see signs of performance problems. This is not ideal but it is the simplest approach. It's reasonably likely that our ZFS fileservers will be more limited by NFS and networking than by SSD performance, and as far as I understand things, nominally full SSDs mostly suffer from write performance issues, not read performance. Our current view (only somewhat informed by actual data) is that our read volume is significantly higher than our write volume. We certainly aren't currently planning any sort of routine preventative work here, and we wouldn't unless we saw problem signs.

If we do see problems signs and do need to clear SSDs, our plan is to do the obvious brute force thing in a ZFS setup with redundancy. Rather than try to TRIM SSDs in place, we'll entirely spare out a given SSD so that it has no live data on it, and then completely clear it, probably using Linux's blkdiscard. We might do this in place on a production fileserver, or we might go to the extra precaution of pulling the SSD out entirely, swapping in a freshly cleared one, and clearing the old SSD on a separate machine. Doing this swap has the twin advantages that we're not risking accidentally clearing the wrong SSD on the fileserver and we don't have to worry about the effects of an extra-long, extra-slow SATA command on the rest of the system and the other drives.

(This plan, such as it is, is not really new with our current generation Linux fileservers. We've had one OmniOS fileserver that used SSDs for a few special pools, and this was always our plan for dealing with any clear problems due to the SSDs slowing down due to being full up. We haven't had to use it, but then we haven't really gone looking for performance problems with its SSDs. They seem to still run fast enough after four or more years, and so far that's good enough for us.)

Written on 01 April 2019.
« Our likely ZFS fileserver upgrade plans (as of March 2019)
NVMe and an interesting technology change »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Apr 1 22:35:23 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.