Multiprocessors are a leaky abstraction
Earlier this week, the story started rumbling through the usual places that using hyperthreading on Intel CPUs could cause serious performance problems in some applications, like Microsoft SQL Server. The current best guess of the cause is cache thrashing due to Hyperthreaded processors sharing L1 and L2 cache. If a process that hammers the cache is scheduled onto one logical processor of a HT pair, it will destroy the performance of anything running on the other logical CPU (I'm summarizing from here).
As this vividly demonstrates, multiprocessors are a leaky abstraction, where very low level details of the system can cause dramatic high level effects. Unlike simpler leaky abstractions, MP issues are often dynamic effects that are challenging to fix or work around (and we already know that most programmers are not just ignorant but wrong about the dynamic behavior of their programs).
Just to show that this happens outside of Hyperthreading, another case is the memory traffic caused by uncontended locks that are taken on multiple CPUs. Taking a lock requires a write, and (locked) writes require the writing CPU get the latest version of the cache line for the target address, which requires bandwidth-consuming synchronization among the caches. (This one happens even in true symmetric MP.)
The leakyness of the MP abstraction matters because multiprocessors of all sorts (HT, multi-core, etc) are increasingly being put forward as the only way to substantially improve system performance over the next while. Often the designs are ccNUMA, which have more complexities (and therefor issues) than plain SMP.
(Credit where credit's due department: I learned about the lock cache traffic issue from discussions on the Linux kernel mailing list.)