Some thoughts on whether and when TRIM'ing ZFS pools is useful

January 27, 2023

Now that I've worked out how to safely discard (TRIM) unused disk blocks in ZFS pools, I can think about if and when it's useful or important to actually do this. In theory, explicitly discarding disk blocks on SSDs speeds up their write performance because it gives the SSD more unused flash storage space it can pre-erase so the space is ready to be written into. So the first observation is that how much TRIM'ing a pool matters depends on how much you're writing to it (well, to filesystems and perhaps zvols in it). If you're writing almost nothing to the pool, you have almost no need of fresh chunks of flash storage.

(As far as I know, TRIM'ing SSDs isn't normally expected to speed up their read performance.)

Next, the amount of help you can get from TRIM'ing SSDs depends on how much space is unused in your ZFS pools, because ZFS can only TRIM unused space. If your pool is 90% full, only 10% of the disk space can be TRIM'd at all. This implies that there's little point in TRIM'ing an almost completely full pool (if you let your pools get that full). On the positive side, triggering a ZFS TRIM of devices in that pool will go quite fast.

(On the negative side, if you scrub after the TRIM, it may take a while because you have lots of data.)

A pretty full pool can still see a significant write volume if people are overwriting existing data, or churning through creating, removing, and recreating files. If you trust ZFS's TRIM support, you might TRIM regularly in order to try to give your SSDs as much explicitly unused space to work with as possible (or even set autotrim on in the pool). On the other hand, if write performance is important to you, you probably should buy bigger SSDs; in general they'll have more headroom for writes.

(I believe that you can preserve this headroom by partitioning the SSDs and only using part of them. Our ZFS fileservers effectively get this some of the time for some SSDs, because we divide our SSDs into standard sized partitions and then use the partitions in ZFS pool vdevs. If a partition isn't assigned to a pool, it will only be written to if it's activated as a spare.)

A relatively ideal case for using TRIM would be a ZFS pool that's not too full but that sees a significant amount of writes through churn in its data, either through overwriting existing data or through creating and deleting files. You would get the former from things like hosting active virtual machines (which overwrite their virtual disks a lot) and the latter from frequently compiling things in a source tree.

(Because ZFS never overwrites data in place, even repeatedly updating the same blocks in a file (such as a virtual disk image) will eventually write all over the (logical) disk blocks on the SSD and force the SSD to consider them as having real data. Filesystems that will overwrite data in place don't have this behavior, so the SSD may get to keep a lot more logical blocks marked as 'has never been written to'.)

Given ZFS's copy on write behavior, I suspect that it's useful to periodically TRIM even low write volume, relatively empty ZFS pools. This depends a fair bit on how much ZFS's reuses disk space over a long time period, but TRIM'ing is probably relatively harmless. It's probably harmless to repeatedly TRIM pools that have low write levels and plenty of space free, but it's also probably not really necessary; with low write volume, mostly what you'll be doing is telling the SSDs things they already know (that the block you TRIM'd before and haven't written to since then is still unused).

For our ZFS fileservers, we're in the process of migrating from 2 TB SSDs to new 4 TB SSDs, which effectively resets the 'TRIM clock' for everything and gives us much more headroom in the form of completely unused partitions. Given this I don't think we're likely to try to TRIM our pools any time soon. Perhaps someday we'll use our metrics system to compare write performance from a year or three ago to write performance today, notice that it's clearly down for some things, and TRIM them.

PS: Much of this logic applies to any filesystem on SSDs, not just ZFS, although ZFS's copy-on-write makes it worse in that it's more likely to touch more of the SSD's logical blocks than other filesystems.


Comments on this page:

By jonys at 2023-01-28 12:20:42:

Another reason for trimming is to extend the lifetime of the flash chips. Explicitly marking blocks as free means that the SSD won't have to copy their contents somewhere else when it decides to shuffle data around as part of balancing block erase count.

Written on 27 January 2023.
« Some notes on using using TRIM on SSDs with ZFS on Linux
I should assume contexts aren't retained in Go APIs »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Jan 27 21:10:57 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.