2011-02-28
The slowdown of Solaris here
Oh, it's not that our Solaris machines have slowed down. As far as I know, they're still running as fast as usual. What's been slowing down is our interest in Solaris, or at least our interest in new versions of Solaris. There was a time when I was relatively carefully tracking what was new in patches, Solaris updates, and OpenSolaris; nowadays I had to check to confirm that Solaris 11 Express was ZFS root only.
A relatively small part of this is because our Solaris machines work fine as they are. A large part of it is because, ultimately, we don't trust Solaris engineering to get things right when they make changes (with good reason, we have seen terrible performance regressions introduced by well-meaning patches).
Because we have so little trust in Solaris, we must do a full test and re-qualification of any patches or new versions of Solaris; this is a lot of work, and we aren't going to start it right away, and when we might start it there's a new version that may be coming out. Beyond that, because we don't trust Solaris changes we always get to weigh the potential benefits of an upgrade against the equally potential drawbacks of running into serious issues in production. So far, that weighting has never come down on the side of an upgrade.
(Although I have not looked at ZFS changes recently, the last time I did
the only really attractive one is 'zpool import -f'. And if we wind up
needing that, we can always crash-build a current Solaris machine to do
the pool import and re-export.)
Oracle's decision with OpenSolaris is a significant factor in this. Not having source code will hamper us in a number of ways and certainly makes running Solaris more dangerous; now if things go wrong we are entirely at the mercy of Oracle support, and our experience with Sun's old Solaris support environment certainly wasn't particularly positive.
(I suspect that the last released OpenSolaris code is already out of date with current Solaris patches, but I haven't checked.)
Given all of this, I simply haven't been very active in watching Solaris developments. There doesn't seem to be much point in paying close attention to something that we're very unlikely to use (at least not any time in the near future).
All of this opens up a large can of worms in terms of our long term future with Solaris, but that's another entry.
2011-02-25
Why I am not enthused about ZFS for my root filesystem
One of the changes in Oracle's recent Solaris 11 preview (and in OpenSolaris releases before it) is that your root filesystem must be a ZFS filesystem; it can no longer be a UFS filesystem. While I understand why Oracle did this, it is not a change that leaves me feeling very enthused.
The short version of why I do not like this is that previously, the
entire ZFS subsystem could go belly-up and your system could still boot.
You might think that ZFS going belly-up entirely should not happen, but
the problem there is /etc/zfs/zpool.cache, the system ZFS cachefile. This is a binary file that can only be maintained by
ZFS tools, and it at least used to be possible for it to become corrupt.
When it became corrupt, your method of fixing it was, uh, to remove it
and start again by re-importing your pools.
This is generally possible if your actual Solaris system filesystem is on UFS (both the 'removing' bit and the 'starting again' bit). My strong impression is that this is much harder if your root filesystem is ZFS, because you have a little chicken and egg problem.
(In fact at one point the official solution for the devices involved in the root pool changing names was 'boot from a rescue environment'. Yes, really. In an enterprise operating system, with self-identifying filesystems and storage pools. I hope that this has changed since then.)
Possibly the rescue environment has a well-honed solution to this problem (one that gets your root pool back and the system booting to single-user mode so that you can fix everything else), or perhaps this doesn't happen any more. But frankly, Solaris 10 has not impressed me with its resilience in the face of various events, so I am not inclined to trust it here; I would much prefer the simpler, far better tested approach of a UFS root filesystem.