The issue of IOPS versus latency on SSDs and NVMe drives

February 10, 2021

Famously, SSDs and especially NVMe drives are very good at handling random IO, unlike spinning rust. If you look at performance information for drives and Wikipedia information on IOPS, you can find very large and very impressive numbers. You'll also usually find footnotes or side notes to the effect that these numbers are usually achieved with high queue depths and concurrency, in order to keep these voraciously fast storage systems fed at all times with the IO requests they need to deliver maximum performance.

In the process of writing another entry, I was about to confidently turn these IOPS numbers into typical access latencies for SSDs and NVMe drives. Then it occurred to me that this conversion is not necessarily valid, because we're in the old realm of bandwidth versus latency (which I originally encountered in networking). Flooding a drive with all the IO requests it could possibly consume maximizes the 'bandwidth' of IO operations but it doesn't necessarily predict the latency that we would experience if we submitted an isolated request.

(I'm not sure that it lets us predict the average latency experienced by requests either, but I'm on more shaky ground there and I'd want to think hard about this and draw some little diagrams of toy models.)

The latency of isolated requests is probably not a useful number to try to measure for general performance information. One problem is that it's going to depend a lot on the operating system and the overall hardware for a fast SSD or a very fast NVMe drive. Flooding a drive with requests to determine its IOPS 'bandwidth' is relatively system neutral and so relatively reproducible, since all you need to figure out is how to get enough simultaneous requests, but an isolated request latency number is hard to both to use and to verify (or reproduce). Even modest changes in operating system internals could affect how fast a single request can flow through, which means even applying software updates could invalidate previous results and make it impossible to cross-compare with older numbers.

At the same time, the latency of isolated requests is often important for practical system performance, especially as drives get faster and faster (and so spend less and less time with queued requests for a given load level). The latency of isolated random reads is especially relevant, since random reads are often synchronous in practice because some piece of software is waiting for the result and can't proceed without it. For instance, walking through many on-disk data structures (including filesystem directory trees for pathname lookups) is random but almost always synchronous for the code doing it.


Comments on this page:

Wouldn’t it suffice to measure the variability in latency for any given IOP during the flooded-queue benchmark?

By cks at 2021-02-16 13:46:36:

I've been thinking about it and I can't convince myself that anything from the flooded-queue case reliably applies to single requests (although possibly I'm not seeing something). There's both operating system level queueing and drive internal queueing, both of which add delays, and I can't see how to extract them out to get some sort of delay-free execution time. Even then the delay-free execution time isn't truly the single request latency, since various things are likely being aggregated across multiple requests (queue submission notification to the NVMe drive, batch queue processing, interrupt handling, and so on).

Written on 10 February 2021.
« Normal situations should not be warnings (especially not repeated ones)
Let's Encrypt is preparing for an emergency and that's good for TLS in general »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Feb 10 00:17:25 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.