An unconventional reason for large RAID stripe sizes
Here's a surprising reason to have a really large RAID stripe size: to reduce per-spindle IO loads for random IO.
How and why this works is going to take some explanation. First, modern disks read and write very fast; what costs time is seeks. Random IO means lots of seeks. In random IO on a striped RAID, this means that any time you have to touch a new disk you pay an additional seek cost. And you switch disks every time you cross a stripe boundary.
It's easy to see how large IOs benefit from larger stripe sizes. But small IOs also get hit by stripe crossings, because their start offset within stripes is random (sometimes aligned to block boundaries), and if they start too close to the end of the stripe they spill over to the next stripe. The larger the stripe size, the lower the chance that the IOs hit the unlucky jackpot.
For example, if you average 8K per IO, always aligned on 4K boundaries, and have a stripe size of 64K, you have a one in sixteen chance of a random IO starting at 60K into the stripe and spilling into the next stripe. If you jump to a 256K stripe size, this drops to one in 64.
Worse, your filesystem may not start on an aligned block in a stripe (eg, your filesystem might start 30K into the above RAID), because of partitioning overhead and so on. This raises the spillover chances and means even the smallest filesystem IO can hit two stripes.
(Per-spindle IO loads matter because a disk can only do so many random IO operations per second. Thus, the more IO operations that only hit a single disk, the more total IOPs per second you can do, either increasing the load you can handle or reducing how many disks you need for a given OS-level load.)
(Possibly this is well known in the industry, but it certainly surprised us when we hit it last year.)
|
|