Why you should support 'reload' as well as 'restart'
October 24, 2012
Suppose that you're writing a network server with persistent connections (these days that can even describe web servers, depending on what sort of web app you're running). Your program should really support able to reload its configuration and related things on the fly, without having to be stopped and restarted. This should include not the configuration alone but anything you normally load once at startup. Especially it should include anything that can time out or expire such as, oh, SSL certificates.
The problem with only supporting configuration and other changes through a full restart is that a full restart usually breaks all of those persistent connections and breaking persistent connections often has undesirable consequences, the kind of consequences that requires system administrators to arrange scheduled downtimes. Reloads don't, not if you do them right.
(Yes, yes, of course your protocol specification says that clients should handle a server disconnect by retrying and resuming things in a way that's transparent to higher layers and the user. If you think that all actual clients do this right I have a bridge for sale that you might find attractive. If you also wrote the only client but haven't carefully and extensively tested it in the face of random disconnects, well, I still wouldn't suggest buying any bridges that you get offered.)
There are two ways to do reloads, more or less: you can wait for an explicit signal or you can simply notice changed things and automatically pick them up. Automatically picking things up is sexy, but speaking as a sysadmin I prefer explicit signals because that avoids issues with half-complete changes. No matter how fast I'm making the changes, with automatic reloads there is always a timing window with half-written files or where only part of the necessary files have been updated.
(As an example, consider updating SSL certificates. These come as two separate objects, a certificate and the private key that goes with it; for correct operation, you need either both new ones present or neither. If your program reloads its configuration partway through you get a mismatched key and certificate.)
Supporting reloading in servers with non-persistent connections is appreciated if you want to do it. No matter how fast a stop and restart sequence is, there's always a time window where the server is not actually running and sometimes this matters.
Supporting reloading is unquestionably more work; the great appeal of 'restart the server to make configuration changes' is that it needs no additional code (you already needed code to shut down cleanly and load the configuration on startup). But it's an important part of creating a system that's manageable and resilient. Real systems have configurations that change over time and they should stay up and available through it.
(This entry has been brought to you by the process of updating SSL certificates across our various systems.)
Sidebar: configuration changes in the face of on the fly reloads
It's possible to make configuration changes reliable in the face of servers that do on the fly reloads. What you have to do is treat everything except the very top level configuration file as immutable once created and activated. If you need to migrate to a new SSL certificate, for example, you don't replace the current certificate file with a new certificate; instead you put the new certificate in a new file and prepare a new version of the top level configuration that refers to that new file instead of the old one. The new configuration is activated by moving the new top level configuration into place (which can be done reliably as a single operation).
(This ought to look familiar. It's the same general approach used by other no-overwrite things such as filesystems and also as one way to both enable and deal with various layers of caching on the web.)
If you have more than one top level configuration file and they aren't totally independent and unrelated, you're out of luck and up the creek. As a corollary, if you're writing a program and are tempting to make it do on the fly reloads please make sure it has only one top level configuration file.
* * *
Atom feeds are available; see the bottom of most pages.