Thinking about an unusual sequence
October 26, 2012
This is a programming puzzle; unlike a lot of what I write about, I don't have any answers right now.
I would like to find an approach to creating a sequence of IO that defeats superintelligent prefetching. As far as I can tell the prefetching I'm dealing with recognizes forward and backwards read IO both straight and with a stride; further, it attempts to find up to N separate streams of prefetchable IO (where N is probably at least eight). The first puzzle is simple: what sort of sequence do I need to defeat all of that?
(Ideally I want a sequence where prefetching never activates, for reasons that don't fit in the margins of this comment. Less ideal is a sequence where the system prefetches but gets it wrong.)
Avoiding straight sequential IO is easy with a hack, so let's ignore that case. I believe that the combination of strides and streams means that I need a (repeating) sequence where the distance between any two of the past N elements is unique. We need uniqueness because if there was a duplicated distance, the system could 'recognize' it as a strided sequential read and predict the next occurrence of this strided read.
Mathematics has something called Golomb rulers, which have exactly this property. In theory I could just use an appropriate one; in practice, I see at least three potential problems. First, Golomb rulers of decent sizes are fairly big. Second, in practice this isn't so much an approach for generating suitable sequences as a set of them; I'd want to hardcode the rulers to the optimal ones. Third, I'm not completely sure how to turn a Golomb ruler into a repeating sequence that preserves its properties.
(I suspect that the way to go is to use a larger Golomb ruler than required, but I think it requires some care so that the rollover doesn't duplicate distances within the last N. Given how large higher-order Golomb rulers are, I don't want to just pick a ruler twice as large as I think I need.)
My gut irrationally suspects that it's possible to do better than Golomb rulers here. My IOs are ordered in a sequence; I don't have to issue them in a simple forward order the way Golomb rulers are normally represented. This seems like it should matter because I think that the sign of the difference between two IO positions kind of matters; the prefetching code never looks back to re-evaluate past history, it only tries to match the current IO up with some IO before it. (Golomb rulers use pure absolute distance.)
(To give an example, I believe that the prefetcher would not trigger for a sequence of block positions of '2 6 4'. While we have a duplicated distance, the offset of '4' from past IOs is seen as +2 and -2 and these are distinct from each other. This suggests that my constraint is too pessimistic, but I'm not sure how to phrase the right constraint and it may depend on how the innards of the prefetcher work. Maybe just using Golomb rulers is the safest approach because it's the most pessimistic.)
* * *
Atom feeds are available; see the bottom of most pages.