Why don't SQL servers do a lot of caching?
[...] why, under the hood, deep down, is something like memcached necessary? Why isn't the SQL server's cache as effective?
I think that one of the reasons for this is that highly aggressive caching in an SQL server is much harder than it looks.
In database servers, caches have to be transparent; non-transparent caches would implicitly break ACID guarantees. Now contemplate the problem of completely correct cache invalidation that still preserves good cache performance. Clearly it's not good enough to discard a cached query result every time any change is made to a table that the query used, because even modest levels of write traffic will dynamite your cache (unless your database has a highly atypical design with many small tables).
Let's consider a hypothetical query to generate a count of visible comments on a particular blog entry:
SELECT count(*) FROM comments WHERE entry = 27793 AND visible = 'Y';
When a transaction changes the comments table, at a minimum the SQL server must test whether any of the affected rows match the WHERE clause and discard the cached query result if it does. This check is necessarily about as expensive (on a per-row basis) as the initial query was, and the SQL server has to do it for every cached query, which the server is going to have a lot of if you're under enough load that you want an aggressive cache. The SQL server can optimize this process to some degree; for example, it can work out common subexpressions in WHERE clauses and merge things so that it only checks them once, not once for every cached query involving them. But I think it's clear that it's not an easy or simple job (and it may burn a lot of CPU).
All of this is much easier when it is the application handling the cache invalidation with a non-transparent cache like memcached; it can just directly shoot down cache entries, because it knows exactly what possible cache entries can be invalidated by a particular operation. The application can even organize what it caches to minimize the amount of invalidation it has to do, using high-level knowledge it has about its own behavior and usage patterns.
(One way to view this is that the application is effectively segmenting one database table into many pseudo-tables in the non-transparent cache, thereby buying itself the ability to do much simpler cache invalidation.)
Now, you can make the application give hints to the SQL server about all of this. But the more hints the application has to give the SQL server, the more the application might as well be using its own non-transparent cache to start with.