2005-11-24
Multiprocessors are a leaky abstraction
Earlier this week, the story started rumbling through the usual places that using hyperthreading on Intel CPUs could cause serious performance problems in some applications, like Microsoft SQL Server. The current best guess of the cause is cache thrashing due to Hyperthreaded processors sharing L1 and L2 cache. If a process that hammers the cache is scheduled onto one logical processor of a HT pair, it will destroy the performance of anything running on the other logical CPU (I'm summarizing from here).
As this vividly demonstrates, multiprocessors are a leaky abstraction, where very low level details of the system can cause dramatic high level effects. Unlike simpler leaky abstractions, MP issues are often dynamic effects that are challenging to fix or work around (and we already know that most programmers are not just ignorant but wrong about the dynamic behavior of their programs).
Just to show that this happens outside of Hyperthreading, another case is the memory traffic caused by uncontended locks that are taken on multiple CPUs. Taking a lock requires a write, and (locked) writes require the writing CPU get the latest version of the cache line for the target address, which requires bandwidth-consuming synchronization among the caches. (This one happens even in true symmetric MP.)
The leakyness of the MP abstraction matters because multiprocessors of all sorts (HT, multi-core, etc) are increasingly being put forward as the only way to substantially improve system performance over the next while. Often the designs are ccNUMA, which have more complexities (and therefor issues) than plain SMP.
(Credit where credit's due department: I learned about the lock cache traffic issue from discussions on the Linux kernel mailing list.)
2005-11-18
SQL as metaprogramming
I'll start by quoting Glyph Lefkowitz from a recent article (okay, recent for my techblog writing):
Metaprogramming is hard, and dress it up however you like, that's what using SQL is. Your code is generating other code, and evaluating its results.
Because metaprogramming is so hard, it is almost exclusively the province of frameworks, environments and operating systems. For good reason, too. Whenever code generates other code, there are potentially very serious mistakes that can get made. [...]
Read the whole thing; it's worth it, and he elaborates on the idea.
SQL is an especially dangerous form of metaprogramming because it isn't just generating code on the fly; it's generating text on the fly and evaluating it as code. Pretty much everything that does that has all sorts of interesting issues, as Glyph points out indirectly.
Metaprogramming is hard in part because it's an extra level of indirection. Because SQL is such a simple 'programming language', it's easy to write it by hand; this tempts people into thinking that it should be equally easy to dynamically generate it in programs. Text-based metaprogramming adds a second level of indirection; you're not creating code, you're creating text that will create code. Overlook something in the translation process and you have an explosion or a security hole.
The general wisdom I've absorbed for dealing with SQL in programs is to lock the all SQL up in a low-level module that handles the jobs needed by the rest of your program. This is essentially a mini framework, and the more you make things OO the more it turns into a general object-relational mapper.
(Mind you, I'm not sure that I like the level of magic required by ORMs, generic SQL frameworks, and so on. Mini-frameworks have the advantage of being small and limited, with few moving parts to worry about.)