How I am doing randomized read IO to avoid ZFS prefetching
If only so that I never have to carefully reinvent this code again, here is how I'm doing randomized read IO to avoid ZFS prefetching. Since ZFS prefetching is the most superintelligent form of prefetching I've ever seen, I expect that this approach would also avoid prefetching on other filesystems and OSes.
The following code is in Python and assumes you have a
function that does the basic read (and also does whatever time
tracking and so on you want):
KB = 1024 FSBLKSIZE = (128 * KB) READSIZE = (4 * KB) CHUNKSIZE = (FSBLKSIZE * 2) def readfile(fd, bytesize): """Do randomized reads from fd at offsets from 0 to more or less bytesize.""" chunks = bytesize // CHUNKSIZE # Constant seed for repeatability. # This is a random number. random.seed(6538029369423517174L) # Create a list of every chunk offset. chunklist = range(0, chunks) # Shuffle the chunks into random order random.shuffle(chunklist) # Read from every chunk for chunk in chunklist: bytepos = chunk * CHUNKSIZE readat(fd, bytepos, READSIZE)
(Disclaimer: this is not exactly the code I'm using, which is messier in various ways, but it is the same approach.)
FSBLKSIZE is your filesystem's block size, the minimum size that
the filesystem actually reads from the disk (on a file that was
written sequentially). On ZFS this is the
recordsize property and
is usually 128 KB. We double this to create our chunk size; to avoid
ever doing ordinary sequential IO (forward or backwards) we'll only
ever do one read per chunk, ie we'll only ever read every second
READSIZE is the size of actual reads we'll do.
It should be equal to or less than
FSBLKSIZE for obvious reasons,
and I prefer it to be clearly smaller if possible.
(The one tricky bit about
FSBLKSIZE is that you need to push it up if
your filesystem ever helpfully decides to read several blocks at once
for you. ZFS generally does not, but others may.)
Rather than repeatedly read at randomized positions until we feel that
we've done enough reads, we generate a list of all (chunk) positions and
then shuffle them into a random order. If your standard libraries don't
have a routine for this, remember to use a high-quality algorithm. Shuffling
this way insures that we never re-read the same block and that we
read from every one of the chunks, doing a predictable amount of IO
(depending on the basic size we tell
readfile() to work on).
Since random numbers are a little bit unpredictable, you should always check the amount of prefetching that your filesystem is actually doing when you run this program. On ZFS this can be done by watching the ARC stats. Note that even unsuccessful prefetching may distort any performance numbers you get by adding extra, potentially unpredictable IO load.
This still leaves you exposed to things like track caching that are done by your hard drive(s), but it's very hard to avoid or even predict that level of caching. You just have to do enough IO and work on a broad enough range of file blocks (and thus disk blocks) that you're usually not doing IOs that are close enough hit any such caches.