Wandering Thoughts archives

2013-01-30

Thinking about FreeBSD versus Illumos for our ZFS fileservers

One of our options as a replacement for Solaris 10 is to switch from Solaris to FreeBSD (now that I no longer believe you need Solaris for ZFS). There are both advantages and uncertainties to such a move and today I want to ramble on a bit about how I see both sides.

On the one hand, we have no particular attraction to Solaris as Solaris. On the other hand, Solaris is the devil we know and this particular devil works pretty well for us (plus I've come around to the idea that DTrace is kind of useful). Using some version of Illumos preserves what we have now while allowing us to use new hardware; we'd also get various improvements in things like package management.

There are two or three drawbacks of Illumos: picking the right distribution, the long term development path, and potentially hardware support. The right distribution is not just a matter of what is good today; given that we'll likely be running these machines for five years or more, we care about the long term viability of the distribution and ideally the continued availability of security updates for old versions. The question of long term development in general is, well, is Illumos going to survive five years or more in useful form (ie, something that will run on a generic server as a generic server OS) or is it going to wither away into at best a narrowly specialized thing? It takes a bunch of work to keep developing a general server OS and there are already lots of other things to drain away potential contributors.

I had to talk about Illumos's drawbacks because FreeBSD is the flipside of them. With FreeBSD we get what is for us a new and untested platform but it has a lot of momentum and history behind it, the policies it operates under are clear, and it seems clearly supported well into the future (since FreeBSD is basically the non-Linux Unix). FreeBSD also has the advantage of dropping a lot of things about Solaris that I don't like in favour of a bunch of well-proven technology and plain modern stuff that I am much happier with.

In short, FreeBSD gives us a more attractive overall system at the cost of some uncertainty over the pieces that we really care about. It also seems likely that FreeBSD DTrace is less mature than Illumos DTrace and as mentioned I sort of care about DTrace these days. Of course this uncertainty can be somewhat mitigated with testing and other people's experiences.

(It's not just an issue of things like normal functionality and performance, although those matter. We also care about the dark corners, which you can't test and you sort of have to take on either trust or painful experience. Illumos lets us give more weight to some of our painful Solaris experience, since it's likely that Illumos is going to be very Solaris-like in many ways.)

PS: I consider it a feature that FreeBSD can be installed with non-ZFS root filesystems. Given some of the ZFS failure modes I've seen I actively prefer not to need ZFS to boot the machine. I'm not sure how many Illumos derived distributions still support this (if any); my impression was that the Solaris world was going full steam ahead to an all-ZFS future.

ZFSFreeBSDvsIllumos written at 00:20:35; Add Comment

2013-01-04

DTrace's stable providers are not good enough

In a comment on my entry on why DTrace doesn't attract people to Solaris that often, Brendan Gregg left a comment where he drew a distinction between two levels of using DTrace:

In terms of difficulty, using the DTrace providers is a little like:

  • fbt provider: writing a simple kernel patch
  • stable providers: writing a shell script

Sysadmins should be able to handle stable providers (eg, io, proc, sched, vminfo). They are documented - you don't need to reach for kernel code. Programming them may be no more difficult than shell scripting.

(For 'fbt provider', you should actually read 'any unstable provider' (my DTrace scripts use sdt as well, for example). And I think that using unstable providers is easier than writing a kernel patch in that you can get far with a basic ability to read C code, although my perspective may be skewed.)

I'll start by saying that my experience has given me some strong biases here and it's possible that I'm missing a world of DTrace usage. Also, all of this is from the perspective of someone using Solaris 10 update 8; some of this has changed in Solaris 11 and perhaps with Illumos. With that said, though:

I agree with Brendan's characterization, but the problem is that DTrace's stable providers are not good enough. For a glaring example of this, getting even relatively basic information about NFS server activity requires using unstable providers (although I think this has been fixed in Solaris 11). The cold, hard truth (as I wrote about a bit when I talked about why we hadn't taken to DTrace) is that the Solaris developers never attempted to develop even a vaguely complete set of stable providers. Almost everything really useful is unstable and thus undocumented (and this leads to the need for system programmers and kernel source code in order to write useful DTrace scripts).

At one level, my personal experience is not necessarily representative; we use our Solaris machines as fileservers and have almost no programs running on them locally. If I had to diagnose many local programs I might value the stable providers (my impression is that many of them are at the kernel to userland interface level), but as it stands I don't think I've ever directly used any stable provider and the information from other people's DTrace scripts using them was at most vaguely useful.

One major lack of stable providers is that almost no significant subsystem inside Solaris has stable providers (although Oracle seems to be changing this in Solaris 11, based on documentation). I've mentioned the NFS server and also I think the NFS client, but ZFS is another large example. There are a lot of important and interesting ZFS activities, all of which have to be extracted through unstable providers. Want to watch device multipathing activity to see if everything is fine? Unstable providers again. You get the idea.

(In fact it's not unusual for kstats to provide better visibility into a subsystem than DTrace does with stable providers.)

One effect of unstable providers being so necessary to solve real problems is that the 'shell scripting' level of DTrace is not deeply useful. Sure, you can put together something from documented interfaces and the DTrace manual (if someone hasn't already written the general script and put it on the net), but what you can write probably won't do you much good or tell you things that are very interesting.

Another problem is that relying on unstable providers makes even canned DTrace scripts harder for people to use, as I ran into with my DTrace scripts. Some kernel data structures changed a bit between S10U8 (what we use) and later versions, which means that my scripts don't work as-is for anyone on a later version and need to be edited at least a bit. Requiring sysadmins to make magic edits to scripts before they can use them is not exactly encouraging people to like and use DTrace.

(DTrace doesn't provide any mechanisms to make this easier, although once again it easily could if it actually cared about this issue. But it doesn't, because this is what sysadmins deserve when they use unstable providers, right?)

DTraceStableProvidersProblem written at 03:35:06; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.