Microkernels and device drivers

September 27, 2012

A commentator on my entry on microkernel modularity asked a good question:

One common argument pro microkernel is that they are more robust, because individual drivers can crash without taking the whole system down. What's your take on this?

I don't know if anyone has done practical studies on this and certainly I have any personal experience myself, but I'm dubious about this claim for several reasons.

To start with, drivers control hardware and hardware itself is extraordinarily powerful. On most machines, hardware can do DMA to (and from) any memory that the driver asks it to, so a driver bug can already smash memory regardless of what access rights the driver code theoretically has or doesn't have. The counter-argument here is that bugs in DMA targets are relatively rare; addressing bugs in the driver code are much more common and the microkernels protect against those.

However the big issue is, well, let me repeat something that Dan Astoorian quoted in a comment on here:

"Never test for an error condition you don't know how to handle." -- Steinbach's Guideline for Systems Programmers.

So, one of your microkernel's driver processes has crashed. What do you next?

Active drivers are highly likely to be crucial to the operation of the system. If you lose the disk controller, the network hardware, or any number of other drivers, your system is very close to being a paperweight even if other bits keep going. In a technical sense the whole system may not have crashed, but the effects are basically the same. The obvious fix for this is to restart and re-run the driver process somehow, just as if it was a user-level process that you were restarting after a crash. Unfortunately, hardware drivers deal with hardware. This means that a crashed driver leaves its hardware in some unknown state, possibly one that's dangerous to touch. In general, to recover the hardware itself and thus make the driver actually useful you need to return the hardware to a known state somehow. And you need to do this with the hardware in an arbitrary state, without you knowing anything about that state.

I'm sure that there's hardware where you can do this (for example, hardware where you can tell the bus to turn off power to the device and then turn it back on). I'm also sure that there's plenty of hardware where you can't and a certain amount of hardware where mis-programming it (or sometimes partially programming it) will lock up your entire machine. This is not something that any amount of microkernel driver isolation can help you with.

The counter-argument is that a microkernel's isolation gives you more options. The system can choose to leave the driver un-restarted until this is initiated by some user-level action, for example. And if the system decides that the best way out is to reboot, you're likely to have more of a chance to save things and terminate processes in an orderly way since it's probable that any damage the crashing driver did has been confined to its own memory.

Comments on this page:

From at 2012-09-27 05:12:50:

Many drivers are for software-only things (eg. filesystem drivers) or things that are inherently resettable (eg. USB devices).

It'd be perfectly OK to have a driver for a USB webcam crash and reload - the worst case scenario is that you need to unplug & replug it to work. (I had a webcam that could reliably crash Linux at one point, if the image it recorded was too bright...).

It'd be likewise understandable if the NTFS driver crashed, to respawn it - it can then fix up the filesystem based on the journal, or whatever. Same goes for the TCP stack, or iptables, or NFS, or the compression & CRC routines in the kernel, the non-hardware parts of the sound subsystems, etc.

If DMA gets involved, it's game over no matter what you do. That much is just a given :)

--mibus <mibus@mibus.org>

From at 2012-09-27 08:39:12:

Modern hardware has IOMMU and VT-d, which could be used to limit the damage of DMA drivers: http://en.wikipedia.org/wiki/DMA_attack#Mitigations

Written on 27 September 2012.
« Microkernels and modularity: do microkernels ever make sense?
fork() versus strict virtual memory overcommit handling »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Sep 27 02:19:24 2012
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.