2013-03-29
Illumos-based distributions are currently not fully mature
As a sysadmin I'm used to my Unixes having certain amenities and conveniences. I've come to accept that any non-hobbyist Unix distribution that wants to be taken seriously (especially a free one) will have things like an announcements or security updates mailing list, a bug tracker, at least a somewhat visible security contact point, and documentation about all of this (along with things like how often security updates are made for any particular release and indeed the release policy). Some form of signed or verified packages are state of the art, along with the key infrastructure to support them.
While some of the various Illumos distributions are clearly hobbyist projects that you can't expect this from, some are equally clearly aspiring to be larger than that (swank websites are one sign of this). But, well, they don't seem to have pretty much any of these amenities that I'm used to. Does this matter or am I being too picky? I think that it does.
(A certain number of the pretty websites started looking a bit bare once I started following links.)
The surface reason is that these things are important for running production systems; for example, I'd really like to know about security fixes as soon as they're available for the obvious reason (we might not apply them, but at least we can assess the severity). The deeper reason is what the omission of these things says to me about the distribution's current audience. To put it one way, none of these things are needed by insiders who are deeply involved in the distribution already; they know the security update practices, they follow the main mailing lists, and so on. All of the documentation and so on is for new people, for outsiders like me, and the less it exists the more it feels like the distribution is not yet mature enough to be sensibly used by outsiders like me.
(There are some bits of this infrastructure that you may want to think about carefully beforehand, like bug trackers. But announce mailing lists are trivial.)
I'm sure that all of this will change in time, at least for the Illumos distributions that want to be used by outsiders like me. But right now I can't help but feel that Illumos distributions are not yet fully mature and up to the level of FreeBSD and modern Linux distributions (regardless of what the quality of the underlying OS is).
2013-03-26
Reconsidering a ZFS root filesystem
A Twitter conversation from today:
@thatcks: Let's see if fsck can heal this Solaris machine or if I get to reinstall it from scratch. (Thanks to ILOMs I can do this from my desk.)
@bdha: fsck? Solaris? Sadface.
@thatcks: I have horror stories of corrupted zpool.cache files too. I don't know if you can boot a ZFS-root machine in that situation.
@bdha: I've been there. zpool.cache backups saved my ass.
Right now all of our Solaris fileservers have
(mirrored) UFS root filesystems instead of ZFS root filesystems and
in the past I've expressed some desire to see that continue in any
future ZFS fileservers we built. I've written about why before; the short version is that I've seen situations where
/etc/zfs/zpool.cache had to be deleted and recreated, and I'm not sure
this is even possible if your root filesystem is a ZFS filesystem.
Using UFS for root filesystems avoids this chicken and egg problem.
(Of course the whole situation around zpool.cache and ZFS pool
activation is a little bit mysterious, at least
in Solaris.)
Well, actually, that's not the only reason. The other reason is that I still think of ZFS as fragile, as something that will go from 'fine' to 'panics your system' under remarkably little provocation. UFS is much more old-fashioned and will soldier on even under relatively extreme circumstances (whether that's a wise idea is another question). Under most circumstances I would rather have our fileservers limping along than dead (even if the entire root filesystem becomes inaccessible, as an extreme example).
But all of this is basically supposition (and thus superstition). UFS certainly has its own problems (one of which I ran into today on our test server) and I've never actually tried out a modern Illumos-based system with ZFS root, both in normal operations and if I deliberately start breaking stuff (and I certainly hope that some of the problems I heard about years ago have been dealt with). It may well turn out that ZFS root based systems are easier to deal with and recover than I expect. They certainly have their own benefits (periodic scrubs are reassuring, for example).
(And to be honest, I think it's quite possible that Illumos will only really well support a ZFS root by the time we get to it. It's clear that ZFS root is where all of the enthusiasm is and where most people think we should be going.)
PS: root filesystem snapshot and snapshot rollback are not particularly
an advantage in our particular environment, since we basically don't
patch our fileservers. Of course periodic snapshots might save us in the
face of a corrupt zpool.cache in the live filesystem.