== How I am doing randomized read IO to avoid ZFS prefetching If only so that I never have to carefully reinvent this code again, here is how I'm doing randomized read IO to avoid [[ZFS prefetching ../solaris/ZFSHowPrefetching]]. Since ZFS prefetching is the most superintelligent form of prefetching I've ever seen, I expect that this approach would also avoid prefetching on other filesystems and OSes. The following code is in Python and assumes you have a _readat()_ function that does the basic read (and also does whatever time tracking and so on you want): KB = 1024 FSBLKSIZE = (128 * KB) READSIZE = (4 * KB) CHUNKSIZE = (FSBLKSIZE * 2) def readfile(fd, bytesize): """Do randomized reads from fd at offsets from 0 to more or less bytesize.""" chunks = bytesize // CHUNKSIZE # Constant seed for repeatability. # This is a random number. random.seed(6538029369423517174L) # Create a list of every chunk offset. chunklist = range(0, chunks) # Shuffle the chunks into random order random.shuffle(chunklist) # Read from every chunk for chunk in chunklist: bytepos = chunk * CHUNKSIZE readat(fd, bytepos, READSIZE) (Disclaimer: this is not exactly the code I'm using, which is messier in various ways, but it is the same approach.) _FSBLKSIZE_ is your filesystem's block size, the minimum size that the filesystem actually reads from the disk (on a file that was written sequentially). On ZFS this is the _recordsize_ property and is usually 128 KB. We double this to create our chunk size; to avoid ever doing ordinary sequential IO (forward or backwards) we'll only ever do one read per chunk, ie we'll only ever read every second filesystem block. _READSIZE_ is the size of actual reads we'll do. It should be equal to or less than _FSBLKSIZE_ for obvious reasons, and I prefer it to be clearly smaller if possible. (The one tricky bit about _FSBLKSIZE_ is that you need to push it up if your filesystem ever helpfully decides to read several blocks at once for you. ZFS generally does not, but others may.) Rather than repeatedly read at randomized positions until we feel that we've done enough reads, we generate a list of all (chunk) positions and then shuffle them into a random order. If your standard libraries don't have a routine for this, [[remember to use a high-quality algorithm http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle]]. Shuffling this way insures that we never re-read the same block and that we read from every one of the chunks, doing a predictable amount of IO (depending on the basic size we tell _readfile()_ to work on). Since random numbers are a little bit unpredictable, you should always check the amount of prefetching that your filesystem is actually doing when you run this program. On ZFS this can be done by [[watching the ARC stats http://blog.harschsystems.com/2010/09/08/arcstat-pl-updated-for-l2arc-statistics/]]. Note that even unsuccessful prefetching may distort any performance numbers you get by adding extra, potentially unpredictable IO load. This still leaves you exposed to things like track caching that are done by your hard drive(s), but it's very hard to avoid or even predict that level of caching. You just have to do enough IO and work on a broad enough range of file blocks (and thus disk blocks) that you're usually not doing IOs that are close enough hit any such caches.