The problem RAID faces with discarding blocks on SSDs

March 22, 2023

One of the things that's good for the performance of modern SSDs is explicitly discarding unused blocks so the SSD can erase flash space in advance. My impression is that modern SSDs support this fairly well these days and people consider it relatively trustworthy, and modern filesystems can discard unused blocks periodically (Linux has fstrim, which is sometimes enabled by default). However, in some environments there's a little fly in the ointment, and that's RAID (whether software or 'hardware').

The issue facing RAID is that in a RAID environment (other than RAID-0), by default there's some relationship between the contents of sector X on one disk and sector X on another disk. In RAID-1 the two sectors are supposed to be identical; in other RAID levels the sectors (along with sectors on other disks) are supposed to have one or more correct checksums. If you TRIM the same sector on two or more SSDs, the basic version of block discard support doesn't promise to give you any particular data, which means that the relationship between the data on different disks is now potentially gone.

(Modern SSDs support 'Deterministic Read After TRIM (DRAT)', cf, but this doesn't promise to return the same data on two different drives, you might get read errors instead, and this doesn't deal with RAID-N checksums.)

Some or perhaps many modern SSDs support 'Deterministic read ZEROs after TRIM' (variously called DZAT, RZAT, or DRZAT). A RAID-1 mirror on SSDs with reliable DZAT can TRIM sector X on all mirrors and be confident that its expected relationship between sectors on disks still holds. A RAID-N parity system might have more troubles here, but it can at least only have to (re)write the parity blocks for an all-zero set of data blocks; the data blocks themselves could be left TRIM'd.

(Probably a RAID-N system could also do this for SSDs supporting DRAT; it would TRIM the data and parity blocks, then re-read the data blocks, calculate the parity for whatever deterministic values it reads, and write the parity out.)

The other option I can think of is for the RAID system to keep track of what block ranges have been TRIM'd and so don't have consistent contents on the actual disks. Some higher end storage systems already support thin provisioning, which requires them to keep track of what user-visible blocks are valid; it's straightforward to use this for SSD block discarding as well. Otherwise the RAID system will require some sort of data structure to keep track of this, which will probably be new.

(Perhaps RAID systems have come up with other clever solutions to this problem.)

Written on 22 March 2023.
« ZFS on Linux and NFS(v3) server filesystem IDs
SSD block discard in practice on Linux systems »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Mar 22 22:27:23 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.