2006-03-27
A helpful Apache safety tip
This is a two part safety tip:
- most things that roll Apache logfiles
SIGHUPApache to get it to close and reopen the logfiles - when Apache is
SIGHUP'd, it closes its current set of sockets and tries to listen on the set that its configuration file says it should use.
So, if you have changed Apache's configuration of what it should listen on, and something else is currently camped on one of those places, something that is scheduled to be killed off during an impending reboot, and your logfiles roll before that reboot, your entire webserver will evaporate in a cloud of:
[notice] SIGHUP received. Attempting to restart
(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
(Just for example.)
The 'something else' in my case was one of the lighttpd Mars movie mirrors; I had put it on a mostly disused IP address on this machine, which required changing Apache's configuration to not bind to port 80 on that IP address. Recently THEMIS took the mirroring down, so I was reverting all of the mirroring changes, and planning on having all of them take effect during the Sunday morning reboot.
Specifically, I was reverting from a series of 'Listen <IP>:80'
directives to the usual plain 'Listen 80'. Unfortunately, you can't
bind a port to the generic wildcard address if anyone is using the
port on a specific IP address, hence the Apache restart problem. (Why
this port binding limitation is a sensible is beyond the scope of this
margin, but it is, however annoying it periodically is.)
2006-03-21
The backend for our recent mirroring
Since I alluded to it in passing in an earlier entry, I might as well describe what I know about how THEMIS set up their systems to handle the load. (Disclaimer: this is second and third hand.)
To cope with the visitors to their regular web site, THEMIS
put their ordinary web servers behind eight Squid proxies, which were in turn behind a load balancer
box. This apparently held up very well to the quite a few millions of
extra visitors from Google Mars.
The main movie page was on their
regular site (and thus behind the Squid proxies), but the links to
the movies pointed to video.mars.asu.edu.
All video.mars.asu.edu did was serve up HTTP redirections to the
URLs of the various mirror locations, more or less in rotating through
them to distribute the load. To be as fast and light as possible its web
server didn't bother to look at the HTTP request, so to mirror a second
file the THEMIS people had to run a second server, which they did by
making the .wmv format movie live on video.mars.asu.edu:81.
THEMIS ran an automated monitoring system to detect overloaded or dead
mirrors. It worked by running through the list of mirror URLs every so
often, making HEAD requests to each; if there wasn't a good response
fast enough, that URL got left out of the list used by the redirector
until it came back to life.
Using HTTP redirects meant that the mirroring could be very simple. It didn't need to worry about DNS round robin or having people set up virtual hosts or anything; all it needed was a list of current mirror URLs. (The disadvantage of HTTP redirects is that the mirroring is semi-exposed to your visitors. I don't think THEMIS cared under the circumstances; they were more concerned that demand for the movies would overwhelm ASU's Internet connection.)
Sidebar: why such a primitive HTTP redirector?
Why not parse the HTTP requests on video.mars, instead of having to run two servers and so on? The THEMIS people were concerned that video.mars would see a huge connection rate and wanted it to be as light-weight and reliable as possible. You could build a pretty lightweight solution with something like lighttpd's built in FastCGI gateway going to a local FastCGI server, but it would have had more moving parts and thus been more risky to build on the spot.
2006-03-17
How not to set up your DNS (part 9)
Presented in the now-traditional illustrated form:
; sdig ns gcluk.net ns1.gcluk.net. ns2.gcluk.net. ; dig mx web1.gcluk.net @ns1.gcluk.net [...] ;; ANSWER SECTION: web1.gcluk.net. IN A 65.98.36.90 ;; AUTHORITY SECTION: web1.gcluk.net. IN NS .
(TTLs have been omitted for clarity.)
That's an interesting authority record they're returning; somewhat more like a disclaimer of authority record, in fact.
The effects are closely related to what I saw in HowNotToDoDNSVII. In
this case, we first looked up the A record and found one, but when
we tried to look for an MX record the completely bogus NS record
stymied our attempts. (Among other problems, there is no A record
for the '.', the DNS root.)
2006-03-15
The aftermath of our mirroring
Since our mirrors of the Valles Marineris movie clips have been running for two days now, it's time to look at how things turned out. I'll start with the raw numbers:
- On Monday, we transfered 198.64 gigabytes in 7,781 requests, despite not getting much traffic until about 10am Eastern time.
- On Tuesday, we transfered 316.36 gigabytes in 13,979 requests.
(At a later date I may wrangle gnuplot into producing some nice graphs of this.)
There's no logging for how many simultaneous connections the web servers
saw, but I was looking periodically with lsof and never saw more than
30 or so connections to each mirror. The load was split roughly evenly
between the two mirrors, so each saw about half the requests and pushed
half the bandwidth.
At a rough guess, we averaged about 32 Mbps of outbound traffic on Monday from 10am onwards, and about 30 Mbps on Tuesday. I was watching subnet utilization graphs on Monday, and there were occasional peaks up towards 100 Mbps over the two subnets involved. (Both subnets have routine traffic fluctuations from other activities, which made it hard to be sure what the mirrors were adding.)
Although these numbers look impressive, they're not all that large compared to what we expected and feared; my mirror system turned out to be rather over-engineered, and I actually could just have used an existing Apache installation. (I suspect that a lot of people found the Google Video version good enough, or at least the clip not compelling enough to get them to download 53+ Mbytes.)
I don't regret the preparation; it was fun, I learned a number of interesting things, and better safe than sorry. Still, it was a little bit disappointing to prepare for a flood and then just get my toes lapped by a lethargic little wave. (The actual THEMIS site apparently got a much more impressive amount of traffic.)
Sidebar: an inbound traffic surge
Interestingly, outbound traffic for the movie clips isn't the full story; there's also inbound traffic.
| Day | Mirror 1 connections | Mirror 2 connections | Mirror 1 volume | Mirror 2 volume |
| Monday | 37,544,203 | 34,425,153 | 2045 Mbytes | 1871 Mbytes |
| Tuesday | 61,126,651 | 54,937,967 | 3328 Mbytes | 2975 Mbytes |
Mirror 1 usually has no inbound traffic, and mirror 2 usually runs about 60,000 inbound connections and 4 Mbytes or so of inbound traffic a day on weekdays.
Some of this extra traffic is simply the inbound HTTP requests for the movies. Some of it is from other requests to the web servers (people looking for a favicon, the ASU mirror monitoring system checking our status, and so on); there were 3,162 such additional web requests on Monday and 5,213 on Tuesday. My cynical guess is that much of the rest of it is from lots of people poking the machines because they were suddenly much more visible to the world.
Update: It turns out that what our traffic monitoring system was reporting as 'connections' was actually the packet count, which makes these numbers far more reasonable. A trawl through our IDS logs suggests that the machines were poked by the outside world no more often than usual.
On the good side, this did cause us to find and fix this problem in the traffic monitoring system's reports.
2006-03-13
Preparing a high load web mirror setup
I spent a chunk of this weekend preparing a mirror for a high load
environment. The mirror only needs to serve a couple of large
video clips, but they're
going to be linked from a high traffic website, so we expect a lot of (simultaneous)
connections and a lot of outgoing bandwidth.
I made a generic mirror url, using a new hostname that I had to put into
a new sub-zone in our DNS. Right now it has two A
records, each with a five minute TTL. Each is on a different 100 Mbit/s
subnet; because our subnet uplinks to the university backbone are only
100 mbit, this is the only way I can do over 100 mbit/s outbound.
The machines run lighttpd, serving just the mirrored files, with enough memory to keep the files in cache. Lighttpd is small and easy to install (nice when you're in a rush), plus as a single process server without threads or forking it can't really kill a machine no matter how many simultaneous connections it gets. (I chose lighttpd over thttpd for reasons I may go into later.)
Looking at this after writing it up, it's surprising to me how little stuff is actually involved. Hopefully this setup will work fine in practice; I'll likely find out soon.
(The setup passed stress tests, but that's not the same as having real load show up.)
Sidebar: lighttpd configuration
Lighttpd has a helpful web page on performance improvements. I turned keep-alives off; as far as I can see, keep-alives are useless for serving unrelated static files and I wanted to maximize how many simultaneous connections I could handle.
In testing, I discovered I had to increase lighttpd's
server.max-fds parameter from 1024. A little thought led me to a doh
moment about this: of course I needed to bump it, because every incoming
connection needs two file descriptors; one for the network socket and
one for the file being sent out. So with 1024 file descriptors the web
server could only handle about 500 simultaneous connections.
2006-03-12
A DNS realization
One thing I did today was set up DNS for a hostname that we may need to re-point elsewhere very rapidly. This caused me to realize something important:
Setting low TTLs doesn't mean squat if you can't cause secondaries to reload on command.
Low TTLs mean that people will re-query A records frequently, but that doesn't help me change where the traffic is going if my secondaries haven't updated to my new set of A records. Unfortunately, none of the secondaries for our domains are under my control, and at least one of them doesn't act on DNS notifications.
The way around this problem is to make a subzone without secondary nameservers. Fortunately I could pick a more or less arbitrary hostname. (Even if you can't pick an arbitrary hostname I suppose you can usually make the fixed name a CNAME into a new subzone.)
I'm glad that I realized the impending problem while I was sitting around drumming my fingers as I waited for the secondaries to pick up the just-added hostname. Running into it during a frantic attempt to shuffle traffic destinations would have been un-fun.
2006-03-08
Thinking about project failures
Via a link from MiniMsft's latest entry, I wound up reading this item on hiring people. In it I hit an interviewer question I realized that I probably wouldn't pass:
6. I will ask you in an interview to describe a project you worked on or effort you made that failed - partially, completely, and miserably are all OK for response fodder. If you've worked for a substantial length of time in IT, you've been part of at least one of these turkeys. [...] If you've got more than, say, 4-5 years of professional experience and you can't (or won't) describe an event like this, I'll deduce that either you're not being straight-up with me or you haven't done what your resume says you did somewhere. [...]
Thinking back, I honestly can't really think of a project I would describe as having 'failed' in this way. I've had projects not work out, but when it happens it's been failures of persuasion, not failures of implementation.
Failures of persuasion are things like not convincing management that an idea is worth doing (and certainly not all of them are), or building something no one is really interested in (or failing to interest people in the project), or disappointing users and having them quietly drift away. Or even failing to inspire passionate users who really want the service, as opposed to people who just use it because meh, it's there.
Or to put it another way: around here, projects don't die spectacular showy deaths in public, they die quiet little ones off in some dusty corner.
(You can have spectacular deaths for projects that work as designed, when it's just that the design was terrible. I don't think I've been involved in any of these, although I may just have been oblivious to it.)
I don't think of projects that fell victim to these as failures so much as things that didn't work out; to me, words like 'failure' and 'turkeys' require something more spectacular and explosive. (This is probably blinkered 'implementor' thinking from some views, and perhaps I should be trying to move past it.)
I'm not sure why we haven't had failures of implementation. Perhaps we've been lucky. Perhaps we've been sufficiently careful with trial projects and the like (we've definitely had trials that didn't work out). Perhaps we're not being aggressive enough (although we do plenty with little).
(We have had problems with services, but we've managed to get over them, so I don't consider these failures. Unforeseen glitches happen; what counts is whether you can fix them.)
PS: I have certainly made my share of mistakes when implementing things. But so far I have been lucky enough that none of them turned into catastrophes; I get to count them as (narrow) escapes, not project failures.
2006-03-07
A modular approach to Apache configuration
The traditional way to configure Apache is to take the httpd.conf
file shipped with your distribution and start changing it to your local
needs; for example, so that the DocumentRoot and ServerName are right.
This is simple and attractive, until you have to upgrade.
The big problem with this approach is that httpd.conf has two sorts
of settings; it mixes together settings that you want to customize,
like DocumentRoot, with Apache internal things, like AddLanguage and
AddCharset. With them all jumbled together, upgrading can be an exercise
in annoyance in merging two sets of changes together.
After burning my fingers once too many times this way, I've switched to
a different, more modular approach. The basic idea is that the only
changes I make to httpd.conf are to comment out things that I need to
change. Plus an 'Include hosts.d/*.conf' line at the end.
All of the actual changes and any new settings go into files in
hosts.d. These range from global settings like additional AddType
declarations to the configuration
of a particular website, setting its DocumentRoot et al. These files
tend to be very short, because Apache configuration is actually not all
that complicated.
Upgrading is a snap: just take the new httpd.conf and comment out all
the site specific stuff again (and add the hosts.d inclusion). When
necessary, change the hosts.d files for new syntax or new options or
the like.
Modularizing this way also clearly shows your important local settings:
they're the ones you have in files in hosts.d. You can easily say
'stock, except for this'. You can modularize more finely; we have a
hosts.d file for our pubcookie setup, which makes it easy to see which
settings are pubcookie specific.
(And don't underestimate the use of having a single place to see all of the directories that a website uses.)
Sidebar: an alternative approach to upgrades
Now that I write this, it occurs to me that this is an ideal situation
for a branching version control system, since what you want is more or
less a three way merge between the old vendor httpd.conf, the new
vendor httpd.conf, and your changes to the old vendor version. And the
changes should almost always be non-conflicting, which makes merges
simple.
I believe there are even standalone three way merge programs, so you
could just save a copy of the stock vendor httpd.conf before you start
editing it, instead of setting up a VCS environment.
2006-03-06
How not to set up your mail server (part 1)
From our SMTP server logs:
remote from [64.151.68.164]
HELO ADV-RH9-NOCP.YOUR-DOMAIN-HERE.com
554 Unresolvable HELO name: ADV-RH9-NOCP.YOUR-DOMAIN-HERE.com
There is such a thing as taking the instructions too literally. (Or perhaps the error here is in not following instructions; who knows.)
This appears to be a PLESK-based setup (judging from the web page on that IP address), which does not make me any better inclined towards these people. Sometimes I think it has become too easy to set up machines on the Internet.
For bonus points, the IP address claims to have the name 'customer-reverse-entry.64.151.68.164'.
2006-03-02
The :; shell prompt trick
For years, I've had a somewhat unusual shell prompt. It looks like this:
: <host> ;
(where <host> is the hostname of the current machine.)
Putting the hostname in your prompt is pretty ordinary, but what's
the other stuff? These days, a more typical shell prompt is something
like 'cks@newman:~$ ', to quote a Debian example. (And many
people use more elaborate prompts, such as Jamie Zawinksi's.)
The trick here is that the : and ; turn my prompt into a valid shell
command that does nothing. This makes cutting and pasting previous
commands in things like xterm much easier, since I don't have to
carefully get just the command while avoiding the prompt. (In xterm
it's just a quick triple click, but then xterm is very good at this.)
(In practice I am sufficiently neurotically neat that I select just the
command, because seeing a doubled prompt looks wrong. This might be
different if my prompt was just ':; ', but I need the host name in it
to keep things straight.)
This trick is not original to me; I believe I got it from observing Geoff Collyer, many years ago.
Sidebar: xterm's double-click selections
One reason I don't use this more is that xterm's double-click
selection mode makes selecting most things pretty fast anyways.
For those who aren't aware of it, when you start a selection by
double-clicking instead of single-clicking, xterm grows the selection
by words instead of characters. (Try it; it's more intuitive than I
make it sound.)
Embarrassingly, I spent years using xterm before I found out about
this. Now I use it all the time, and hardly ever have to select by
characters.