Wandering Thoughts archives

2014-03-05

A bit more about the various levels of IPC: whether or not they're necessary

A question you could ask about the levels of IPC that I outlined is if anything past basic process to process IPC is actually necessary (this is the level of TCP connections or Unix domain sockets). One answer to this is basically 'of course not'. All you really need is for programs to be able to talk to each other and then you can build whatever each particular system needs from there, possibly using common patterns like simple ASCII request/response protocols. A lot of software has gotten very far on this basis.

The other answer is that yes, in the end you do need all of those additional levels. To put it one way, very few people design introspectable message bus systems with standardized protocol encodings for fun. These systems get designed and built and adopted because they solve real problems; in the jargon, they are a design pattern. If you don't create them what you really get is a whole collection of ad-hoc and often partial versions that all of the various systems have reinvented on their own. For example, not having a standard protocol encoding does not free programs from needing to define a wire protocol; it just means that every program does it separately and differently and some number of them will do it badly.

(And in the modern world some number of them will make security mistakes or have buffer overruns and other flaws. A single standard system that everyone uses has the potential advantage of being carefully designed and built based on a lot of research and maybe even experience. Of course it can also be badly done, in which case everyone gets a badly done version.)

In this sense the additional levels of IPC really do wind up being necessary. It's just not the mathematical minimization sense of 'necessary' that people sometimes like to judge systems on.

programming/IPCLevelsII written at 22:58:38;

ZFS's problem with boot time magic

One of the problems with ZFS (on Solaris et al) is that in practice it involves quite a bit of magic. This magic is great when it works but is terrible when something goes wrong, because it leaves you with very little to work with to diagnose and fix your problems. Most of this magic revolves around the most problematic times in the life of ZFS, that being system shutdown and startup.

I've written before about boot time ZFS pool activation, so let's talk about how it would work in a non-magical environment. There are essentially two boot time jobs, activating pools and then possibly importing filesystems from the pools. Clearly these should be driven by distinct commands, one command to activate all non-active pools listed in /etc/zfs/zpool.cache (if possible) and then maybe one command to mount all unmounted ZFS filesystems. You don't really need the second command if pool activation also mounts filesystems the same way ZFS import does, but maybe you don't all of that happening during (early) boot and would rather defer both mounting and sharing until later.

ZFS on Solaris doesn't work this way. There is no pool activation command; pools just magically activate. And as I've found out, pools also apparently magically mount all of their filesystems during activation. While there is a 'zfs mount -a' command that is run during early boot (via /lib/svc/method/fs-local), it doesn't actually do what most people innocently think it does.

(What it seems to do in practice is mount additional ZFS filesystems from the root pool, if there is a root pool. Possibly it also mounts other ZFS filesystems that depend on additional root pool ZFS filesystems.)

I don't know where the magic for all of this lives. Perhaps it lives in the kernel. Perhaps it lives in some user level component that's run asynchronously on boot (much like how Linux's udev handles devices appearing). What I do know is that there is magic and this magic is currently causing me a major amount of heartburn.

Magic is a bad idea. Magic makes systems less manageable (and kernel magic is especially bad because it's completely inaccessible). Unix systems have historically got a significant amount of their power by more or less eschewing magic in favour of things like exposing the mechanics of the boot process. I find it sad to have ZFS be a regression on all of this.

(There is also regressions in the user level commands. For example, as far as I can see there is no good way to import a pool without also mounting and sharing its filesystems. These are actually three separate operations at the system level, but the code for 'zpool import' bundles them all together and provides no options to control this.)

solaris/ZFSBootMagicProblem written at 00:22:15;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.