The problem of simulating random IO

October 23, 2012

Due to recent events, we're now rather interested in being able to measure, characterize, and track our disk IO performance. This, it turns out, presents some problems in the modern world.

Of course the gold standard thing to measure is your actual observed performance in production (and thanks to some work with DTrace, I can now actually do that). However, the problem with production performance is that so many things influence it that it's hard to know what changes in it mean. In particular, if we observe production performance slowing down we don't know if it's because we've got more load, different IO patterns than before, or a genuine problem that we can do something about. So we need to be able to do controlled tests that measure real disk performance, in detail. This means defeating prefetching, which means what we really want to measure is random IO performance.

In theory this is simple enough, given an existing large file; pick a block size, generate or get some high-quality random numbers, seek to those offsets (in blocks), read or write something, and repeat. In practice there is a significant problem with using genuinely random numbers to drive random IO: repeatability (and its closely related cousin, consistency). If I run my test today then run it again in a month and get different results, is that difference because our IO system genuinely changed or because I got different random numbers?

So what I really want is a fixed sequence of non-sequential IO that defeats operating system prefetching (at least). The problem with this in the modern world is that OSes and filesystems are getting disturbingly superintelligent about detecting patterns in your IO. Sequential forward and backwards? That's easy (everyone does at least sequential forwards). Forwards and backwards with a stride? That too. Multiple streams of any of the above, interleaved? There are filesystems that detect it (and I happen to be dealing with one of them). Coming up with a sequence of IO that defeats all of this is what they call an interesting problem.

(And one that I don't have a solution for.)

Which brings me to a small request for filesystem designers: please provide a way to turn off your superintelligent prefetching for specific IO. Yes, it's great, but sometimes people want to do real IO right through to the disks. Having this feature also turn off all caching (and turn off placing the data in the cache) is optional but probably appreciated. I suggest that you borrow the Linux O_DIRECT flag to open() rather than invent your own different interface.

(Providing a filesystem-wide or system-wide flag is not good enough. I don't want to turn off all prefetching on a production filesystem or fileserver so that I can accurately measure disk IO performance; that cure is worse than the disease.)


Comments on this page:

From 173.164.235.197 at 2012-10-23 01:26:31:

Seems to me like you could just use Blum Blum Shub with a fixed seed. I'd like to see the filesystem that can reverse engineer the seed for a cryptographically strong PRNG.

-- Donald King

From 212.199.107.218 at 2012-10-23 03:20:31:

I can understand why you want to see end-to-end performance by testing the filesystem, but I don't think it is that useful and the extra requests that you add will make the filesystem interfaces more complex.

For the recent problems that you've seen just testing the block device itself should be sufficient. Both on the Linux iscsi servers where you can test individual disks and on your Solaris systems where you can test the iscsi transport.

Having the building blocks monitored independently will also make it easier to figure out where the problem is.

If you have the end-to-end check along with the building blocks you can probably discern problems from good/bad caching.

Written on 23 October 2012.
« Why parsers matter
Why you should support 'reload' as well as 'restart' »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Tue Oct 23 00:43:04 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.