The issue with measuring disk performance through streaming IO
Suppose, not entirely hypothetically, that you are interested in measuring the performance of your disks. Of course you understand that averages are misleading and that latency is important, so you fire up a handy tracing IO performance tester that does streaming reads and dumps timing traces for each IO operation.
This might sound good, but I feel that using streaming IO for this is generally a mistake. It isn't a fatal one, but you are potentially throwing away information on latency and making it harder to be sure of any odd results you see. The problem is what prefetching does to your true timing information.
(Your streaming IO will be prefetched by any number of layers, right down to the disk itself. You may be able to turn off some of them, but probably not all.)
There are two cases, depending on how fast the rest of your program runs. If your program is comparatively slow, perhaps because you wrote it in an interpreted language for convenience, prefetching can completely destroy the real latency. If a prefetched IO completes before your program got around to asking for it, that's it, you don't know anything more about its latency than that (and you may not know how slow your program is unless you think to measure it). It could have taken 5 milliseconds or 500, either (for a sufficiently slow program) is the same. But you probably care very much about the difference.
If your program is sufficiently fast that it is not the limiting factor, you're going to outrun prefetching. Prefetching is not magic so if you can consume IO results faster than the bandwidth from the disks to your program, your program will wind up waiting for IO completion and so seeing latencies that are probably more or less typical. But I'm not convinced that you'll necessarily see the real details of unusually slow IOs and if there are patterns in IO latencies, prefetching may well blur them together. It's possible that you don't actually lose any information here, but if so it's something that I'd have to think through very carefully. The need to do that makes me cautious, so I think it's undesirable to use even this full-speed streaming IO while measuring latency.
So my conclusion: if you want to measure latency, you need to somehow avoid prefetching.
(The exception is if what you care about actually is latency during streaming IO. Or just long-term bandwidth during streaming IO, and you don't care about latency outliers and brief IO stalls.)
Sidebar: checking to see if your program is fast enough
This one is simple: look at the read bandwidth your program is getting, as compared to your typical simple brute-force bandwidth micro-benchmark. The closer your program comes to the maximum achievable bandwidth, the faster it is. If you hit full bandwidth, your code is not the bottleneck and any detailed latency information you get is as trustworthy as possible under the circumstances.