The backend for our recent mirroring
Since I alluded to it in passing in an earlier entry, I might as well describe what I know about how THEMIS set up their systems to handle the load. (Disclaimer: this is second and third hand.)
To cope with the visitors to their regular web site, THEMIS
put their ordinary web servers behind eight Squid proxies, which were in turn behind a load balancer
box. This apparently held up very well to the quite a few millions of
extra visitors from Google Mars.
The main movie page was on their
regular site (and thus behind the Squid proxies), but the links to
the movies pointed to video.mars.asu.edu
.
All video.mars.asu.edu
did was serve up HTTP redirections to the
URLs of the various mirror locations, more or less in rotating through
them to distribute the load. To be as fast and light as possible its web
server didn't bother to look at the HTTP request, so to mirror a second
file the THEMIS people had to run a second server, which they did by
making the .wmv
format movie live on video.mars.asu.edu:81
.
THEMIS ran an automated monitoring system to detect overloaded or dead
mirrors. It worked by running through the list of mirror URLs every so
often, making HEAD
requests to each; if there wasn't a good response
fast enough, that URL got left out of the list used by the redirector
until it came back to life.
Using HTTP redirects meant that the mirroring could be very simple. It didn't need to worry about DNS round robin or having people set up virtual hosts or anything; all it needed was a list of current mirror URLs. (The disadvantage of HTTP redirects is that the mirroring is semi-exposed to your visitors. I don't think THEMIS cared under the circumstances; they were more concerned that demand for the movies would overwhelm ASU's Internet connection.)
Sidebar: why such a primitive HTTP redirector?
Why not parse the HTTP requests on video.mars, instead of having to run two servers and so on? The THEMIS people were concerned that video.mars would see a huge connection rate and wanted it to be as light-weight and reliable as possible. You could build a pretty lightweight solution with something like lighttpd's built in FastCGI gateway going to a local FastCGI server, but it would have had more moving parts and thus been more risky to build on the spot.
|
|