SSD block discard in practice on Linux systems

March 23, 2023

I'll put the summary up front. If you have SSD based systems installed with a reasonably modern Linux, it's pretty likely that they are quietly automatically discarding blocks from your SSDs on a regular basis. This is probably true even if you use software RAID mirrors (despite the potential problem RAID has with discarding blocks).

To start with, you can see if your SSDs are capable of discarding blocks with 'lsblk -dD'. If block discard is possible, it will report something like:

; lsblk -dD
NAME    DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda            0      512B       2G         0
sdb            0      512B       2G         0
sr0            0        0B       0B         0
zram0          0        4K       2T         0
nvme0n1        0      512B       2T         0
nvme1n1        0      512B       2T         0

But what about your software RAID arrays? You can check those too:

; lsblk -dD /dev/md*
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
md20        0      512B       2T         0
md25        0      512B       2G         0
md26        0      512B       2G         0
md31        0      512B       2T         0

If you guessed that md20 and md31 are on the NVMe disks and md25 and md26 are on the SATA SSDs, you're correct. All of these are mirrors.

On typical modern Linux systems, the actual ongoing trimming is done by fstrim, which is run from 'fstrim.service', which is trigger by 'fstrim.timer' on a regular basis; see 'systemctl list-timers' to see if it's enabled on your system. Typical setups have fstrim logging what it did into the systemd journal, so you can see what it did with 'journalctl -u fstrim.service' (possibly with -r to see the most recent runs first). Both Fedora and Ubuntu seem to enable fstrim by default; my Fedora desktops and our 20.04 and 22.04 Ubuntu servers all have it on.

Modern Linux kernels expose IO statistics about discards that have happened on each device (since the system was last rebooted). These are visible in /proc/diskstats, and are covered in Documentation/admin-guide/iostats.rst. Because these IO stats have been in diskstats for a while, things that parse and extract information from diskstats may also report them. In particular, this information is reported by the Prometheus host agent and can be used in a suitable Prometheus setup to see how much discarding your various devices are doing and have been doing (including for software RAID devices).

Not all filesystems support the block discarding related features that fstrim needs, although ext4 and btrfs both do (for btrfs, see their Trim/discard page). In particular, ZFS on Linux doesn't support them, and so the regular fstrim.timer won't TRIM your ZFS pools. Instead, there are various options for doing this and if you want you can do so more cautiously than fstrim normally lets you. Looking at IO statistics for discarding can confirm what filesystems do and don't support this, especially since discard information is available for partitions.

Knowing that our SSDs have been TRIM'd for some time (probably years) without any visible explosions makes me somewhat more confident about using some sort of ZFS TRIM'ing on my desktops (our servers don't need it right now for reasons outside the scope of this entry). I'm still not fully confident for ZFS because while the SSDs and regular filesystems may be well tested for TRIM, I'm not sure how much production use ZFS TRIM has had.

(I discovered this quiet, problem free TRIM'ing yesterday and then did some further investigation which let to discovering metrics and so on.)

Written on 23 March 2023.
« The problem RAID faces with discarding blocks on SSDs
Key rotation is not the same as key revocation (or invalidation) »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Mar 23 23:06:18 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.