Wandering Thoughts archives

2011-09-15

How your Linux installer should help you set up filesystems

I've said before that Linux installers handle setting up software RAID exactly backwards, forcing you to go from low level mechanics to high level results. In fact this is a symptom of a general failure in how most distributions handle partitioning and setting up filesystems.

The common approach I've seen is to offer you two basic choices: you can accept a few common variants of a canned filesystem configuration or you can specify everything from the ground up. If you're setting up software RAID, the ground up way is tedious and annoying; you create a bunch of partitions for RAID, associate them into RAID arrays, and then assign the RAID arrays to filesystems or swap space or whatever. Once upon a time I though that the right solution to the software RAID issue was basically to add mirrored software RAID to the canned options if the installer detected a suitable hardware setup. In retrospect I was only solving half of the real problem.

The real problem and the real solution is that the installer needs a 'goals-driven' intermediate method of setting up filesystems, one that starts from you wanting a given filesystem and works down instead of starting from the bottom and working up. You would start by telling the installer that you wanted to create a particular filesystem (or a swap area or whatever), then the installer would present you with options about where it'd go; on suitable hardware this would include 'mirrored software RAID' and the installer would take care of the details.

Among other advantages, I think that this would feel less tedious (even if you effectively did all of the same things) because you would always be answering a question that was relevant and necessary for your goal. You would always be moving forward and making progress. (Also, hopefully you would get asked fewer questions because the installer would make sensible choices for you.)

Such a filesystem-down approach to setup wouldn't cover all cases, so it couldn't replace bottom-up partitioning. But if it was added to installers as an option, I think that many people would happily switch to using it instead of the current tedious approach to partitioning.

(Assuming that it covered mirrored software RAID, we certainly would and it would definitely save us time.)

PS: although I haven't looked at auto-installers like Kickstart and FAI recently, my impression is that they already have at least some of this top-down specification of what you want.

InstallerPartitioning written at 01:28:37; Add Comment

2011-09-04

Why true asynchronous file IO is hard, at least for reads

Although I believe that this may now have changed, Linux for a long time didn't fully support asynchronous file read IO although it supported some sorts of other async IO. One reason for this is that doing genuine fully asynchronous file IO is a lot more complicated than it looks.

In theory you might think that this is an easy feature to add to a filesystem. After all, the kernel can already do asynchronous reads of a data block; it's just that the normal path for read() issues the asynchronous read and then immediately waits for it to complete. So it looks like you should basically be able to add a flag that propagates down through that code to say 'don't wait, signal here when the read completes' and be mostly done (you also need to translate the notification to user space and so on).

In practice the problem is not the data block read itself, but getting to the point where you can issue the data block read. In order to know where to read the data block from, you need to consult various bits of file metadata, like indirect blocks. Which may mean reading them off disk. In traditional filesystem implementations (including ext2 and its descendents), this was implemented with synchronous IO because that's by far the simplest way to write the code. Revising all of this code to work asynchronously is a lot of work and so didn't get done for a while.

(Async IO implemented with threads (well, faked really) doesn't have this problem because the thread wraps all of that up behind your back. Thread based async IO has the problem that now you need a lot of threads and supporting infrastructure.)

By the way, I believe that similar issues came up with write IO. In a sense write IO is easier because the kernel already does a lot of it asynchronously. However, in order to figure out where to place a new data block the kernel may need to do some metadata IO (as I've vividly seen before), and this was historically often done synchronously. Changing filesystems to what I believe is called delayed write allocation also turned up all sorts of interesting corner cases; for example, you need to track filesystem free space synchronously because otherwise you can accept a write that you'll later discover you have no space for.

(Let's not talk about per-user disk space quotas.)

HardAsyncFileIO written at 23:49:16; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.