More on baking websites to static files and speed

April 14, 2011

A commentator on my first entry on this asked a good question:

Where do you draw the distinction between a baked site and a cached site? They're both a snapshot of a dynamic site. They both suffer from potentially stale cache. They both require an invalidation mechanism for the publishers.

I think there are two important differences.

First, a baked site is effectively a permanent cache. The word 'permanent' is the important thing, because it means that you absolutely have to get invalidation right because there is nothing else that will save you if the wrong data gets into the baked site.

(Any permanent cache has almost the same problem that invalidation must be completely correct.)

A temporary cache can do invalidation on heuristics because if the heuristics don't work out, bad data will time out 'soon enough' anyways. The ultimate version of this is to have no invalidation heuristics at all, just timeouts, and accept temporarily stale pages or data. This makes the problem of cache invalidation (or validation) much simpler, especially in extreme cases; such cases are good enough to make you survive Slashdot-style load surges, so for many people that's all they need.

Second, a cache still works if there is a cache miss; a baked site generally does not. This means that you have a big hammer to deal with cache problems: you simply flush the entire cache. Your site is suddenly slow until the cache rebuilds, but it still works and more importantly, it is instantly guaranteed correct and current. There is no equivalent with typical implementations of baked sites (although there are implementation tricks that give you this); the software may let you force a full rebuild, but it won't give you a correct site on the spot since 'populating' the 'cache' is an asynchronous process.

This also means that your site still works completely if something didn't make it into the cache or if the cache is malfunctioning for some reason. Pre-baked sites have no similar mechanism; if something doesn't get baked for some reason or gets removed somehow, well, it's a 404 until you (or software) notice and fix it. The advanced version of this is that it's quite easy and natural to deliberately have a partially cached dynamic site, instead of caching everything. There's no such natural equivalent for baked sites (although once again it can be done with implementation tricks).

Comments on this page:

From at 2011-04-14 11:25:27:

The ultimate version of this is to have no invalidation heuristics at all, just timeouts...

This is common practice on baked sites. Processes responsible for refreshing stale content do so on regular intervals, not only on a button press.

Second, a cache still works if there is a cache miss; a baked site generally does not.

Both suffer from the same problem. A baked site failing to write out a particular resource is analogous to a dynamic application failing to supply a resource. A cache isn't going to fix this.

Fact is, they're both forms of full-page caching. One is just a bit more robust than the other.

IMO the biggest advantage to using a caching proxy over a baked site is all of the cache control mechanisms built into HTTP/1.1. Publishers are happy because clients and intermediate proxies can easily force invalidation. App developers are happy because all they need to do is set appropriate cache control headers in their replies.

My favorite approach is a balance of the two. As you stated in your first log, applications can (and should) cache expensive db queries or write them out in temporary tables (Amazon makes heavy use of bdb files for this... call them baked tables if you will). Page fragments which change infrequently can be cached as well. Couple all of this with a good caching proxy strategy and your site should roll with the punches just fine.

Written on 14 April 2011.
« Some common caching techniques for dynamic websites
Why a dynamic website with caching is simpler than a baked site »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Apr 14 00:12:59 2011
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.