The issue of IOPS versus latency on SSDs and NVMe drives
Famously, SSDs and especially NVMe drives are very good at handling random IO, unlike spinning rust. If you look at performance information for drives and Wikipedia information on IOPS, you can find very large and very impressive numbers. You'll also usually find footnotes or side notes to the effect that these numbers are usually achieved with high queue depths and concurrency, in order to keep these voraciously fast storage systems fed at all times with the IO requests they need to deliver maximum performance.
In the process of writing another entry, I was about to confidently turn these IOPS numbers into typical access latencies for SSDs and NVMe drives. Then it occurred to me that this conversion is not necessarily valid, because we're in the old realm of bandwidth versus latency (which I originally encountered in networking). Flooding a drive with all the IO requests it could possibly consume maximizes the 'bandwidth' of IO operations but it doesn't necessarily predict the latency that we would experience if we submitted an isolated request.
(I'm not sure that it lets us predict the average latency experienced by requests either, but I'm on more shaky ground there and I'd want to think hard about this and draw some little diagrams of toy models.)
The latency of isolated requests is probably not a useful number to try to measure for general performance information. One problem is that it's going to depend a lot on the operating system and the overall hardware for a fast SSD or a very fast NVMe drive. Flooding a drive with requests to determine its IOPS 'bandwidth' is relatively system neutral and so relatively reproducible, since all you need to figure out is how to get enough simultaneous requests, but an isolated request latency number is hard to both to use and to verify (or reproduce). Even modest changes in operating system internals could affect how fast a single request can flow through, which means even applying software updates could invalidate previous results and make it impossible to cross-compare with older numbers.
At the same time, the latency of isolated requests is often important for practical system performance, especially as drives get faster and faster (and so spend less and less time with queued requests for a given load level). The latency of isolated random reads is especially relevant, since random reads are often synchronous in practice because some piece of software is waiting for the result and can't proceed without it. For instance, walking through many on-disk data structures (including filesystem directory trees for pathname lookups) is random but almost always synchronous for the code doing it.