Wandering Thoughts archives

2009-11-30

Poking around the OpenSolaris codebase (for sysadmins)

If you do much work with Solaris things like DTrace and mdb -k, you are sooner or later going to want to poke around the OpenSolaris code, both for kernels and for utilities and so on. If you do this very much, you are going to want your own local copy of the OpenSolaris codebase (while you can use the OpenSolaris website, sooner or later navigating through it will drive you mad). You can get a copy with Mercurial; see here for instructions on how. For spelunking purposes, there is little to no reason to get the binary only bits.

(Just to confuse you, OpenSolaris is called 'ON/Nevada' in much of this.)

Now, there's an important caution to this: OpenSolaris source is not the same thing as Solaris source. You can generally use OpenSolaris source as a guide to what you'll find with DTrace et al, but it's not a sure thing (even if you go back in version history), and sometimes you will find important differences. Some of these can be spotted by looking at structure definitions in your own system's include files in /usr/include, but not all of the interesting header files make it there. In some cases you may have to resort to dumping structures with mdb's ::print operation.

Everything useful in the onnv-gate repository lives in usr/src, and I'm going to quote paths relative to this from now on.

  • In general, everything has most of its code in a <whatever>/common subdirectory (for 'code common across all architectures', I assume).

  • mdb source is in cmd/mdb, and the mdb modules are mostly in the common/modules subdirectory. Reading mdb module source can be the best way to find out interesting mdb commands and exactly what they do; this can lead to useful discoveries.

  • most interesting kernel source is in uts/common in a relatively obvious layout. Many internal header files are in the sys/ subdirectory here; others can be found in the source area for their code, eg fs/zfs/sys for internal ZFS headers.

  • ZFS commands rely on some ZFS libraries to do most of the work; they're in lib/libzpool and lib/libzfs. These are what you need to look at if you want to figure out the division between user and kernel space, and also what limitations are artificially imposed by zpool and zfs and what limitations are real.

In general the repository history in the onnv-gate repository is not very useful. Sometimes you can use 'hg log -v' and so on to pick out the specific code change that fixed a bug number that you're interested in, and thus see how applicable to your particular circumstances it may be.

(The other thing I've used the repo history for is to trace the code for a particular ZFS kernel feature that I wanted to use back in time to establish that I would have to use a relatively recent OpenSolaris build in order to get it.)

PokingOpenSolarisSource written at 01:03:26; Add Comment

2009-11-21

An update on faulted ZFS spares

We've recently got some additional pieces of news on the faulted ZFS spares situation.

First, our suspicion as to the cause was correct; Sun has confirmed that there is a race in adding the same spare to multiple pools under some circumstances. The fix for it is apparently in Solaris 10 update 8, and Sun did an 'IDR' for us for our Solaris 10 update 6 systems. (I assume but have not confirmed that just applying the current set of ZFS patches and their prerequisites is good enough.)

Second, Solaris 10 update 8 can properly 'zpool remove' faulted spares from pools, so even if Sun has not completely fixed all of the spares-related races yet you can recover from the situation yourself. Again, it's likely that this fix is in the current set of ZFS patches (and Sun put it in our IDR).

(Mind you, since the current set of ZFS patches depend on a kernel rollup patch, installing them is not all that far from a full upgrade to S10U8 as far as we're concerned, because in our NFS fileserver environment kernel and ZFS patches are by far the most risky ones. Although not always, and sadly that particular bug is still in S10U8.)

However, the more I have seen of how Sun handles ZFS pool spares in general, the less confidence I have in it working properly when we need it. Right now I consider ZFS's own spare handling to be at most an emergency measure; it's the sort of thing that gets you from the middle of the night to the morning when you read your email, not something that you let sit and handle problems on its own.

ZFSFaultedSparesII written at 02:20:04; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.