Unbalanced reads from SSDs in software RAID mirrors in Linux
When I was looking at the write volume figures for yesterday's entry, one additional thing that jumped out at me is that on our central mail server, reads were very unbalanced between its two system SSDs. This machine, as with many of our important servers, has a pair of SSDs set up as mirrors with Linux software RAID. In theory I'd expect reads to be about evenly distributed across each side of the mirror; in practice, well:
242 Total_LBAs_Read [...] 16838224623 242 Total_LBAs_Read [...] 1698394290
That's almost a factor of ten difference. Over 90% of the reads have gone to the first SSD, and it's not an anomaly or a one-time thing; I could watch live IO rates and see that much of the time only the first disk experienced any read traffic.
It turns out that this is more or less expected behavior in Linux software RAID, especially on SSDs, and has been for a while. It appears that the core change for this was made to the software RAID code in 2012, and then an important related change was made in late 2016 (and may not be in long-term distribution kernels). The current state of RAID1 read balancing is kind of complex, but the important thing here in all kernels since 2012 is that if you have SSDs and at least one disk is idle, the first idle disk will be chosen. In general the read balancing code will use the (first) disk with the least pending IO, so the case of idle disks is just the limit case.
(In kernels with the late 2016 change, this widens to if at least one disk is idle, the first idle disk will be chosen, even if all mirrors are HDs.)
SSDs are very fast in general and they have no seek delays for non-sequential IO. The result is that under casual read loads, most of the time both SSDs in a mirror are idle and so the RAID1 read balancing code will always choose to read from the first SSD. Reads spill over to the second SSD only if the first SSD is already handling a read at the time that an unrelated second read comes in. As we can see here, that doesn't happen all that frequently.
(Although our central mail server is an outlier as far as how unbalanced it is, other servers with mirrored SSDs also have unbalanced reads with the first disk in the mirror seeing far more than the second disk.)