Why Solaris's SMF is not a good init system

July 2, 2014

An init system has two jobs: running and supervising services, and managing what it runs and supervises. SMF is a perfectly good init system as far as the former goes, and better than some (eg the traditional System V system). It is the second job where SMF falls down terribly because it's decidedly complex, opaque, fragile, and hard to manipulate. The result is a very divided experience where as long as you don't have to do anything to SMF it's a fine init system but the moment you do everything becomes immensely frustrating.

Here is an illustration of how complex, opaque, and fragile SMF is. The following is a script of commands that must be fed to svccfg as one block of actions in order to make two changes to our OmniOS systems: to start syseventd only after filesystem/local (it normally starts earlier), and to start ssh after filesystem/minimal (ie very early in the boot process, so if things go wrong we have system access).

select svc:/system/filesystem/local
delpg npiv-filesystem
select svc:/milestone/devices
delpg devices
select svc:/milestone/single-user
delpg syseventd_single-user
select svc:/system/sysevent:default
addpg filesystems-dep dependency
setprop filesystems-dep/grouping = astring: "require_all"
setprop filesystems-dep/restart_on = astring: "none"
setprop filesystems-dep/type = astring: "service"
setprop filesystems-dep/entities = fmri: "svc:/system/filesystem/local:default"
select svc:/network/ssh
setprop fs-local/entities = fmri: "svc:/system/filesystem/minimal"
delpg fs-autofs

There are two obvious things about this sequence of commands, namely that there are quite a lot of them and they are probably pretty opaque if you're not familiar with SMF. But there are several other things that are worthy of mention. The first is that it is actually fairly difficult to discover and work out what these commands should be and need to be; I had multiple false steps and missteps during the process. Many of the names involved in this process are arbitrary, ie up to the individual services to decide on and as you can see many of the services have chosen different names. These names are of course not documented and thus presumably not officially supported.

(Nor do the OmniOS SMF manpages discuss how your changes interact with, say, applying a package update for the packages you've manipulated.)

The next inobvious thing is that if you get some of these wrong, SMF will thoroughly blow up your system by, for example, detecting a dependency cycle and then refusing to have any part of it instead of trying some sort of fallback cycle-breaking in order to allow your system to boot to some extent. Nor does SMF prevent you from creating a dependency cycle by (for example) refusing to commit a service change that would set up such a cycle; instead it just tells you that you've made one somehow. This is why I call managing SMF a fragile thing.

Oh, and the third inobvious thing is that there are several ways to do what I've done above, all of them probably roughly equivalent. At least one thing I've done is a convenient hack instead of what would be the theoretically 'correct' way; I've done the hack because the theoretically correct way is too much of a pain in the rear. That by itself is a glaring problem indicator, as doing the correct thing should be the easiest approach.

(The hack is that instead of deleting ssh's property group for its dependency on filesystems/local and creating a new property group for a new dependency on filesystem/minimal, I have instead rewritten the specific service that 'fs-local' depends on and thus its name is now kind of a lie. But this change is one line instead of six lines, making it an attractive hack.)

To be manageable, an init system needs to be clear, well documented, and easy to use. You should be able to easily discover what properties a service has, what properties it can have, how these affect its operation, and so on. It should be obvious or at least well documented how to change service start order (because this is a not uncommon need). For dependency-based init systems without a strict ordering, it should be easy to discover what depends on what (including transitively) and either impossible or as harmless as possible to create dependency cycles. It should not require a major obscure bureaucracy to change things, nor hunts through the Internet and Stack Overflow to work out how to do things.

SMF is not a success at any of these, especially being easy to use (about the only thing that is simple in SMF is simply disabling and enabling services). That is why I say that it is not a good init system. If I had to describe it in a nutshell, I would say that SMF is a perfect illustration of what Fred Brooks calls the second system effect. People at Sun clearly wanted to make a better init system that fixed all of the problems people had ever had with System V init, but what they put together is utterly over-engineered and complex and opaque.

(I also have to mention that SMF falls down badly on the small matter of managing serial port logins. Doing this in SMF is so complicated that no one I've talked to can tell me how to successfully enable logins on a particular serial port. Really. This is yet another sign that something is terribly wrong in how SMF is configured and manipulated, even if it's perfectly fine at starting and restarting services once you can configure them.)

Written on 02 July 2014.
« An index of non-letter control characters
Bash is letting locales destroy shell scripting (at least on Linux) »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Jul 2 00:45:28 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.