2006-09-08
A Solaris 8 Disksuite single user mode surprise
If you boot a Disksuite-using Solaris 8 machine into single-user mode
to do maintenance and do a metastat
, you'll discover that all of your
mirrored metadevices are marked as needing to be metasync'd, even if
they actually are fully consistent.
What seems to be going on is that Disksuite doesn't update things
from the on-disk metadata state database when the kernel brings
up the metadevices themselves in early boot. Instead, it defers this
until you explicitly run 'metasync -r
', which is normally done in
/etc/init.d/lvm.sync
, which is only run as part of going into runlevel
2.
(At least I assume that the kernel is bringing up the Disksuite devices itself in early boot, since these machines have their root filesystem on Disksuite mirrors. I am not quite up on the black box of early Solaris boot.)
The fix is pretty simple; once you're up in single-user mode, just
remember to run '/etc/init.d/lvm.sync start
' before you start
futzing around much with the disks.
(Our experience is that it goes like lightning unless something is
genuinely troublesome, which is about what you'd expect. But check
with metastat
afterwards, just to be sure. You probably don't need
to do this if you're bringing a system down from normal operation into
single-user mode, just if you're booting straight into single-user
mode, but I haven't tested this to be sure.)
This makes a certain sort of sense from the right viewpoint, since it means that the system is doing as little as possible when coming up into single user mode. I have no idea how the kernel picks what to write to when it has to write to a metadevice, though. And it does mean you have to remember an extra step for most routine boots in single-user mode.
(The good news is that the excitement this caused us when we stumbled over this will probably insure that I don't forget this any time soon.)
I hate hardware (AMD CPU edition)
I have been trying to spec out a new machine lately, which is reminding me all over again how I hate hardware. This time around, the target of my particular hate is AMD CPUs, especially the new AM2 ones, where the performance picture has become so complicated that you need a large chart to understand it.
Choosing CPUs used to be simple: within a given CPU family, the only thing that changed performance was the clock speed, so you could just buy the fastest CPU your budget and desires afforded and be done with it.
Athlons are no longer like that. Within the Athlon 64 X2 AM2 family, there are now three variables: clock speed, L2 cache size per core, and achievable main memory speed (the clock multiplier, as explained by AnandTech). Models with increasing nominal clock speeds zig-zag in the other attributes, to the point where I had to consult the large Wikipedia page of Athlon 64 processors to keep things straight.
(Thank god for Wikipedia. Good luck finding AMD discussing this anywhere you can conveniently find it; I'm not sure they even have a comparison chart of L2 cache sizes on their website.)
Then, once I'd worked all this out, it turns out that the supply of 1MB L2 parts seems to have dried up around here; local computer shops can't even get the Socket 939 versions with 1MB L2 caches, much less the AM2 ones. (Rumour has it that AMD has starved the distributor pipeline in favour of redirecting most of the supply to certain large computer vendors.)
I could try to view the 1MB L2 part drought as a way of simplifying my life, but instead it just irritates me that I can't spec the CPUs I really want.
(I care about the cache size and main memory speed because I tend to think that they dominate performance for the kind of CPU-intensive things I'm likely to do with my machines. Not that I've actually measured this to find out for sure, which makes me some sort of fool.)
Link: IRON File Systems
IRON File Systems [PDF] is a paper from the 2005 ACM Symposium on Operating Systems Principles. To quote from the abstract:
Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. [...]
They did their primary analysis on Linux ext3, ReiserFS 3, and (Linux) JFS; the results are comprehensive, interesting, and sometimes scary.