Wandering Thoughts archives

2011-09-04

Why true asynchronous file IO is hard, at least for reads

Although I believe that this may now have changed, Linux for a long time didn't fully support asynchronous file read IO although it supported some sorts of other async IO. One reason for this is that doing genuine fully asynchronous file IO is a lot more complicated than it looks.

In theory you might think that this is an easy feature to add to a filesystem. After all, the kernel can already do asynchronous reads of a data block; it's just that the normal path for read() issues the asynchronous read and then immediately waits for it to complete. So it looks like you should basically be able to add a flag that propagates down through that code to say 'don't wait, signal here when the read completes' and be mostly done (you also need to translate the notification to user space and so on).

In practice the problem is not the data block read itself, but getting to the point where you can issue the data block read. In order to know where to read the data block from, you need to consult various bits of file metadata, like indirect blocks. Which may mean reading them off disk. In traditional filesystem implementations (including ext2 and its descendents), this was implemented with synchronous IO because that's by far the simplest way to write the code. Revising all of this code to work asynchronously is a lot of work and so didn't get done for a while.

(Async IO implemented with threads (well, faked really) doesn't have this problem because the thread wraps all of that up behind your back. Thread based async IO has the problem that now you need a lot of threads and supporting infrastructure.)

By the way, I believe that similar issues came up with write IO. In a sense write IO is easier because the kernel already does a lot of it asynchronously. However, in order to figure out where to place a new data block the kernel may need to do some metadata IO (as I've vividly seen before), and this was historically often done synchronously. Changing filesystems to what I believe is called delayed write allocation also turned up all sorts of interesting corner cases; for example, you need to track filesystem free space synchronously because otherwise you can accept a write that you'll later discover you have no space for.

(Let's not talk about per-user disk space quotas.)

linux/HardAsyncFileIO written at 23:49:16; Add Comment

How some Unixes did shared libraries in the old days

Yesterday I wrote about how mmap() is the core of modern shared libraries. As it happens some Unixes had shared libraries even before mmap() was created, which raises the question of how they did it.

As I mentioned yesterday, the real challenge with shared libraries is the relocation issue, how you deal with the same shared library having to be mapped at different addresses in different processes. The trick answer is not to do that. You may have heard of prelinking; the extreme version of prelinking is to 'prelink' all shared libraries by assigning each of them a static load address (and relocating them for that address), and then always load them at that address in every process. This completely eliminates the need to do any run-time relocation.

Figuring out the load address of each shared library is where it gets interesting. If you're only doing this on a handful of libraries, you can give each of them their own dedicated chunk of address space. If you have more than that, you have to start looking at executables to identify libraries that are never loaded together and so can have address space assignments that conflict with each other.

(If an executable later comes along that has a load address conflict because it uses two libraries that were previously never seen together, it loses; the kernel will refuse to run it or it will probably crash shortly after it starts running. This is one reason that this approach is somewhat of a hack.)

This is clearly not a very general or scalable solution. Typically it was adopted in a desperate attempt to reduce disk space and memory usage on small Unix systems, and so only really had to work for a small number of libraries (libc, libm, perhaps termcap and/or curses, and perhaps the X libraries on systems with X). If you could get it to work for your package's shared library, that was great; if not, well, you got to statically link your library into your programs and use up more disk space and memory.

(The Unix machine I remember seeing this on was the AT&T 3B1. I believe that similar hacks were done on other obscure early attempts to fit a full Unix setup on small personal computers.)

unix/OldSharedLibraries written at 00:37:30; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.