Solaris 9's slow patch installs
Yesterday was my first time installing the Solaris 9 recommended patch set on a production machine; we rolled it onto a basically unpatched server. Because it was a server, I did it in single user mode (the patch set recommends this, as some patches in the patch set say explicitly to apply them in single-user mode).
I already knew that installing the patch set was achingly slow on my test machine, but my test machine is an Ultra 10 so I wasn't surprised. The machine from yesterday was a Sunfire V210, which has modern CPUs and more importantly modern amounts of memory and fast SCSI disks.
It still took an hour.
There are 134 patches in the patch set, so Solaris was only able to average a patch every 26 seconds. Considering how much work a modern machine can do in 26 seconds, I believe I can safely say that the Solaris patch install system is hideously inefficient.
(And, as previously noted it spews incomprehensible and alarming messages on the screen.)
Fortunately it doesn't demand I answer any questions during its run, so next time around I'll know to just go back to my office for a while. Still, an hour is an irritatingly long time to have a production server down in single-user mode.
Multiprocessors are a leaky abstraction
Earlier this week, the story started rumbling through the usual places that using hyperthreading on Intel CPUs could cause serious performance problems in some applications, like Microsoft SQL Server. The current best guess of the cause is cache thrashing due to Hyperthreaded processors sharing L1 and L2 cache. If a process that hammers the cache is scheduled onto one logical processor of a HT pair, it will destroy the performance of anything running on the other logical CPU (I'm summarizing from here).
As this vividly demonstrates, multiprocessors are a leaky abstraction, where very low level details of the system can cause dramatic high level effects. Unlike simpler leaky abstractions, MP issues are often dynamic effects that are challenging to fix or work around (and we already know that most programmers are not just ignorant but wrong about the dynamic behavior of their programs).
Just to show that this happens outside of Hyperthreading, another case is the memory traffic caused by uncontended locks that are taken on multiple CPUs. Taking a lock requires a write, and (locked) writes require the writing CPU get the latest version of the cache line for the target address, which requires bandwidth-consuming synchronization among the caches. (This one happens even in true symmetric MP.)
The leakyness of the MP abstraction matters because multiprocessors of all sorts (HT, multi-core, etc) are increasingly being put forward as the only way to substantially improve system performance over the next while. Often the designs are ccNUMA, which have more complexities (and therefor issues) than plain SMP.
(Credit where credit's due department: I learned about the lock cache traffic issue from discussions on the Linux kernel mailing list.)