Wandering Thoughts archives

2012-10-30

How I am doing randomized read IO to avoid ZFS prefetching

If only so that I never have to carefully reinvent this code again, here is how I'm doing randomized read IO to avoid ZFS prefetching. Since ZFS prefetching is the most superintelligent form of prefetching I've ever seen, I expect that this approach would also avoid prefetching on other filesystems and OSes.

The following code is in Python and assumes you have a readat() function that does the basic read (and also does whatever time tracking and so on you want):

KB = 1024
FSBLKSIZE = (128 * KB)
READSIZE = (4 * KB)

CHUNKSIZE = (FSBLKSIZE * 2)

def readfile(fd, bytesize):
  """Do randomized reads from fd at
  offsets from 0 to more or less
  bytesize."""
  chunks = bytesize // CHUNKSIZE

  # Constant seed for repeatability.
  # This is a random number.
  random.seed(6538029369423517174L)

  # Create a list of every chunk offset.
  chunklist = range(0, chunks)

  # Shuffle the chunks into random order
  random.shuffle(chunklist)

  # Read from every chunk
  for chunk in chunklist:
    bytepos = chunk * CHUNKSIZE
    readat(fd, bytepos, READSIZE)

(Disclaimer: this is not exactly the code I'm using, which is messier in various ways, but it is the same approach.)

FSBLKSIZE is your filesystem's block size, the minimum size that the filesystem actually reads from the disk (on a file that was written sequentially). On ZFS this is the recordsize property and is usually 128 KB. We double this to create our chunk size; to avoid ever doing ordinary sequential IO (forward or backwards) we'll only ever do one read per chunk, ie we'll only ever read every second filesystem block. READSIZE is the size of actual reads we'll do. It should be equal to or less than FSBLKSIZE for obvious reasons, and I prefer it to be clearly smaller if possible.

(The one tricky bit about FSBLKSIZE is that you need to push it up if your filesystem ever helpfully decides to read several blocks at once for you. ZFS generally does not, but others may.)

Rather than repeatedly read at randomized positions until we feel that we've done enough reads, we generate a list of all (chunk) positions and then shuffle them into a random order. If your standard libraries don't have a routine for this, remember to use a high-quality algorithm. Shuffling this way insures that we never re-read the same block and that we read from every one of the chunks, doing a predictable amount of IO (depending on the basic size we tell readfile() to work on).

Since random numbers are a little bit unpredictable, you should always check the amount of prefetching that your filesystem is actually doing when you run this program. On ZFS this can be done by watching the ARC stats. Note that even unsuccessful prefetching may distort any performance numbers you get by adding extra, potentially unpredictable IO load.

This still leaves you exposed to things like track caching that are done by your hard drive(s), but it's very hard to avoid or even predict that level of caching. You just have to do enough IO and work on a broad enough range of file blocks (and thus disk blocks) that you're usually not doing IOs that are close enough hit any such caches.

HowToDoRandomizedIO written at 00:51:34; Add Comment

2012-10-27

The difference between cryptographic and normal random number generators

This is something I was looking into today and I want to write stuff down before it falls out of my head again. I'll start with the premise: suppose that you want to generate unpredictable random numbers to avoid attempts to predict your actions. Do you need a cryptographically secure (pseudo) random number generator, or is your language's normal PRNG good enough?

All (pseudo) random number generators (and hashes) have some amount of internal state that is manipulated and permuted to generate their sequence of random numbers. If an outside observer can fully recreate this internal state (and knows the algorithm), they can work out the future values that the PRNG function will generate (because they can just run their own copy of the PRNG). In some cases they can also run the PRNG backwards, working out all previous random values. There are a number of technical requirements for cryptographically secure PRNGs, but a significant part of it is that a CSPRNG is designed to not reveal this internal state whereas a normal PRNG may well leak this state if you watch a long enough sequence of its random numbers (see eg the quick mention of this in the Wikipedia page about the Mersenne twister).

(This resistance to state discovery is not stated explicitly in the Wikipedia page on CSPRNGs, but it is implied by the next-bit test. If you can discover the internal PRNG state from past bits, you have a clear polynomial-time algorithm for 100% accurate predictions of the next bits.)

In a strict sense, a CSPRNG is more random and less predictable than a normal PRNG even in non-cryptographic situations. In a practical sense the difference between a CSPRNG and a good PRNG (one that passes strong tests for statistical randomness and so on) is likely to be very small. If you're generating random numbers for ordinary use and you have a CSPRNG handy (and it runs fast enough), you might as well use it. But if you don't, there's almost certainly no need to implement one just to get 'sufficiently unpredictable' random numbers; the output of a good normal PRNG is good enough for pretty much any use.

(While a normal PRNG can be reverse engineered and predicted, this requires someone (or some code) that is actively and explicitly attempting to do so. This is extremely unlikely in general situations, so good statistical randomness is all you need.)

It would be handy if everyone's language runtime and standard libraries contained a CSPRNG implementation, but I'm not holding my breath on that. In practice I'm happy if everyone has a high-quality PRNG like the Mersenne twister.

CryptographicVsNormalPRNG written at 01:22:36; Add Comment

2012-10-26

Thinking about an unusual sequence

This is a programming puzzle; unlike a lot of what I write about, I don't have any answers right now.

I would like to find an approach to creating a sequence of IO that defeats superintelligent prefetching. As far as I can tell the prefetching I'm dealing with recognizes forward and backwards read IO both straight and with a stride; further, it attempts to find up to N separate streams of prefetchable IO (where N is probably at least eight). The first puzzle is simple: what sort of sequence do I need to defeat all of that?

(Ideally I want a sequence where prefetching never activates, for reasons that don't fit in the margins of this comment. Less ideal is a sequence where the system prefetches but gets it wrong.)

Avoiding straight sequential IO is easy with a hack, so let's ignore that case. I believe that the combination of strides and streams means that I need a (repeating) sequence where the distance between any two of the past N elements is unique. We need uniqueness because if there was a duplicated distance, the system could 'recognize' it as a strided sequential read and predict the next occurrence of this strided read.

Mathematics has something called Golomb rulers, which have exactly this property. In theory I could just use an appropriate one; in practice, I see at least three potential problems. First, Golomb rulers of decent sizes are fairly big. Second, in practice this isn't so much an approach for generating suitable sequences as a set of them; I'd want to hardcode the rulers to the optimal ones. Third, I'm not completely sure how to turn a Golomb ruler into a repeating sequence that preserves its properties.

(I suspect that the way to go is to use a larger Golomb ruler than required, but I think it requires some care so that the rollover doesn't duplicate distances within the last N. Given how large higher-order Golomb rulers are, I don't want to just pick a ruler twice as large as I think I need.)

My gut irrationally suspects that it's possible to do better than Golomb rulers here. My IOs are ordered in a sequence; I don't have to issue them in a simple forward order the way Golomb rulers are normally represented. This seems like it should matter because I think that the sign of the difference between two IO positions kind of matters; the prefetching code never looks back to re-evaluate past history, it only tries to match the current IO up with some IO before it. (Golomb rulers use pure absolute distance.)

(To give an example, I believe that the prefetcher would not trigger for a sequence of block positions of '2 6 4'. While we have a duplicated distance, the offset of '4' from past IOs is seen as +2 and -2 and these are distinct from each other. This suggests that my constraint is too pessimistic, but I'm not sure how to phrase the right constraint and it may depend on how the innards of the prefetcher work. Maybe just using Golomb rulers is the safest approach because it's the most pessimistic.)

UnpredictableSequenceNeed written at 01:43:32; Add Comment

2012-10-24

Why you should support 'reload' as well as 'restart'

Suppose that you're writing a network server with persistent connections (these days that can even describe web servers, depending on what sort of web app you're running). Your program should really support able to reload its configuration and related things on the fly, without having to be stopped and restarted. This should include not the configuration alone but anything you normally load once at startup. Especially it should include anything that can time out or expire such as, oh, SSL certificates.

The problem with only supporting configuration and other changes through a full restart is that a full restart usually breaks all of those persistent connections and breaking persistent connections often has undesirable consequences, the kind of consequences that requires system administrators to arrange scheduled downtimes. Reloads don't, not if you do them right.

(Yes, yes, of course your protocol specification says that clients should handle a server disconnect by retrying and resuming things in a way that's transparent to higher layers and the user. If you think that all actual clients do this right I have a bridge for sale that you might find attractive. If you also wrote the only client but haven't carefully and extensively tested it in the face of random disconnects, well, I still wouldn't suggest buying any bridges that you get offered.)

There are two ways to do reloads, more or less: you can wait for an explicit signal or you can simply notice changed things and automatically pick them up. Automatically picking things up is sexy, but speaking as a sysadmin I prefer explicit signals because that avoids issues with half-complete changes. No matter how fast I'm making the changes, with automatic reloads there is always a timing window with half-written files or where only part of the necessary files have been updated.

(As an example, consider updating SSL certificates. These come as two separate objects, a certificate and the private key that goes with it; for correct operation, you need either both new ones present or neither. If your program reloads its configuration partway through you get a mismatched key and certificate.)

Supporting reloading in servers with non-persistent connections is appreciated if you want to do it. No matter how fast a stop and restart sequence is, there's always a time window where the server is not actually running and sometimes this matters.

Supporting reloading is unquestionably more work; the great appeal of 'restart the server to make configuration changes' is that it needs no additional code (you already needed code to shut down cleanly and load the configuration on startup). But it's an important part of creating a system that's manageable and resilient. Real systems have configurations that change over time and they should stay up and available through it.

(This entry has been brought to you by the process of updating SSL certificates across our various systems.)

Sidebar: configuration changes in the face of on the fly reloads

It's possible to make configuration changes reliable in the face of servers that do on the fly reloads. What you have to do is treat everything except the very top level configuration file as immutable once created and activated. If you need to migrate to a new SSL certificate, for example, you don't replace the current certificate file with a new certificate; instead you put the new certificate in a new file and prepare a new version of the top level configuration that refers to that new file instead of the old one. The new configuration is activated by moving the new top level configuration into place (which can be done reliably as a single operation).

(This ought to look familiar. It's the same general approach used by other no-overwrite things such as filesystems and also as one way to both enable and deal with various layers of caching on the web.)

If you have more than one top level configuration file and they aren't totally independent and unrelated, you're out of luck and up the creek. As a corollary, if you're writing a program and are tempting to make it do on the fly reloads please make sure it has only one top level configuration file.

AlsoHaveReload written at 01:41:16; Add Comment

2012-10-22

Why parsers matter

One reaction to my problem with parsing wikitext is to ask why it matters. So what if I don't know how to properly parse WikiText; clearly lots of people (myself included) have written code to process wikitext dialects and the code works, even if it may not be the most elegant code and we may not have a nice formal model for it. If regexp based bashing works, just use regexp based bashing.

My answer is that parsing is ultimately here to tell us how to easily write code that handles some (computer) language. Things like lexers, recursive descent parsers, and so on reduce potentially hairy problems to known and readily implemented patterns; once you know the general tricks, the details for your particular language are usually just vaguely tedious. This has been a huge boon for our practical ability to process all sorts of computer 'languages'.

(In the process, parser theory has often delivered parsers that are efficient as well as easily written.)

This isn't just an issue of getting something that works, because you can get there with bashing things together repeatedly. There's two additional issues: getting your parser reliable and evolving it over time. By reliability, I don't just mean not crashing; for parsers, 'reliability' means that it correctly parses everything it should and doesn't accept anything it shouldn't. Reliability feeds into evolution over time, because ad hoc parsers are often fragile (as well as hard to work with); changes in one place (to change one aspect of what the parser handles) can have unintended consequences for what the parser accepts in general.

(Consider a really complicated regular expression, such as some of the monster ones that people use to recognize all or almost all valid forms of email addresses. These are so complex that it's very difficult to see how they work, what exactly they recognize, and how to change them to add another form of address without everything exploding.)

In short and as I put it succinctly in ParsingWikitext, parsers matter because they're much easier to write, to maintain, and to reason about than ad-hoc code; ad-hoc processing code is ultimately just a big pile of complex, fragile mud.

(Note that the difficulty of fully reasoning about an ad hoc parser is a large part of why it's hard to get them reliable and to change them over time. When you can't really fully understand the parser and what it will do, you have problems. And complex ad hoc parsing is almost always quite hard to understand.)

Every ad hoc parser that I have to write is a weak point in my code. Not only was it hard work to create, but it's something that's hard to reason about, hard to completely understand, and hard to change. All of those make me nervous and that's ultimately why I want to find how to parse these things properly.

WhyParsersMatter written at 00:38:37; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.