2007-02-05
Another small user interface suggestion
Dear Xine: since various modern sound systems are amply equipped to play multiple audio sources at once, your 'mute' button should mute only your audio, not the entire audio chain. That way I can turn down the Internet music stream I'm listening to in the background in favour of something of more immediate interest, like YouTube videos, without having to quit out of you entirely.
While we're here, it would be nice if you didn't block purely audio streams just because you can't get some X events in edgewise because I'm busy placing a window at the moment and my window manager has an X server grab. Especially if you happen to be iconified at the time, with your geegaws turned off.
(I know, Xine is not exactly the best application in the world for playing streaming Internet music. However, it seems to be the only Fedora Core 6 application that can consistently play AAC+ streams, which means that I am stuck with it for the moment.)
('Another'? Yes; see SmallUISuggestion, the first in the series.)
2007-02-02
Why don't SQL servers do a lot of caching?
Recently, nothings asked an interesting question in a comment here:
[...] why, under the hood, deep down, is something like memcached necessary? Why isn't the SQL server's cache as effective?
I think that one of the reasons for this is that highly aggressive caching in an SQL server is much harder than it looks.
In database servers, caches have to be transparent; non-transparent caches would implicitly break ACID guarantees. Now contemplate the problem of completely correct cache invalidation that still preserves good cache performance. Clearly it's not good enough to discard a cached query result every time any change is made to a table that the query used, because even modest levels of write traffic will dynamite your cache (unless your database has a highly atypical design with many small tables).
Let's consider a hypothetical query to generate a count of visible comments on a particular blog entry:
SELECT count(*) FROM comments WHERE entry = 27793 AND visible = 'Y';
When a transaction changes the comments table, at a minimum the SQL server must test whether any of the affected rows match the WHERE clause and discard the cached query result if it does. This check is necessarily about as expensive (on a per-row basis) as the initial query was, and the SQL server has to do it for every cached query, which the server is going to have a lot of if you're under enough load that you want an aggressive cache. The SQL server can optimize this process to some degree; for example, it can work out common subexpressions in WHERE clauses and merge things so that it only checks them once, not once for every cached query involving them. But I think it's clear that it's not an easy or simple job (and it may burn a lot of CPU).
All of this is much easier when it is the application handling the cache invalidation with a non-transparent cache like memcached; it can just directly shoot down cache entries, because it knows exactly what possible cache entries can be invalidated by a particular operation. The application can even organize what it caches to minimize the amount of invalidation it has to do, using high-level knowledge it has about its own behavior and usage patterns.
(One way to view this is that the application is effectively segmenting one database table into many pseudo-tables in the non-transparent cache, thereby buying itself the ability to do much simpler cache invalidation.)
Now, you can make the application give hints to the SQL server about all of this. But the more hints the application has to give the SQL server, the more the application might as well be using its own non-transparent cache to start with.
2007-02-01
Transparent versus non-transparent caching
One of the divisions in caching is between what I will call transparent and non-transparent caches. Transparent caches are ones where the only thing you are supposed to notice is faster speed; non-transparent caches require you to manage them explicitly, especially cache invalidation. Operating system disk caches are an example of transparent caches, at least in theory.
(At some level every cache is non-transparent, because it has to be managed by someone's code. So this is just a question of how a cache looks to your layer of code.)
Transparent caches are harder to implement than non-transparent ones; because they have to be effectively invisible, they must get cache invalidation completely correct, which is surprisingly hard. Non-transparent caches leave the work of invalidation to your code, where you're in a position to know what results can be a little stale (and so have simpler cache invalidation strategies) and what results need complete accuracy.
Adding caching to an existing software layer to speed it up almost always requires that the caching be transparent. Even if the results of a non-transparent cache can technically be justified under a careful reading of the layer's specification, no one is going to like you very much; after all, their code is breaking because of something you did.
(This is analogous to compiler optimization, where no one cares how much ANSI C lets your compiler get away with if it breaks their program, whether or not their code was technically illegal or counting on implementation defined behavior. This is more or less the china shop rule: if you broke it, it's your responsibility, no matter how fragile it was to start with.)