2015-05-31
Unix has been bad before
These days it's popular to complain about the terrible state of software on modern Linux machines, with their tangle of opaque DBus services, weird Gnome (or KDE) software, and the requirement for all sorts of undocumented daemons to do anything. I've written a fair amount of entries like this myself. But make no mistake, Linux is not uniquely bad here and is not some terrible descent from a previous state of Unix desktop grace.
As I've alluded to before, the reality is that all of the old time Unix workstation vendors did all sorts of similarly terrible things themselves, back in the days when they were ongoing forces. No Unix desktop has ever been a neat and beautiful thing under the hood; all of them have been ugly and generally opaque conglomerations of wacky ideas. Sometimes these ideas spilled over into broader 'server' software and caused the expected heartburn in sysadmins there.
To the extent that the Unixes of the past were less terrible than the present, my view is that this is largely because old time Unix vendors were constrained by more limited hardware and software environments. Given modern RAM, CPUs, and graphics hardware and current software capabilities, they probably would have done things that are at least as bad as Linux systems are doing today. Instead, having only limited RAM and CPU power necessarily limited their ability to do really bad things (at least usually).
(One of the reasons that modern Linux stuff is better than it could otherwise be is that at least some of the people creating it have learned from the past and are thereby avoiding at least some of the mistakes people have already made.)
Also, while most of the terrible things have been confined to desktop Unix, not all of them were. Server Unix has seen its own share of past bad mistakes from various Unix vendors. Fortunately they tended to be smaller mistakes, if only because a lot of vendor effort was poured into desktops (well, most of the time; let's not talk about how the initial SunOS 4 releases ran on servers).
The large scale lesson I take from all of this is that Unix (as a whole) can and will recover from things that turn out to be mistakes. Sometimes it's a rocky road that's no fun during things, but we get there eventually.
My view of setting up sane web server application delegations
One of the things that drives the appeal of CGI scripts is the easy deployment story compared to, say, a
bunch of programs that implement web applications is the deployment
configuration problem. When you have a bunch of
applications on the same machine (and using the same IP), you need
a central web server to take incoming requests and dispatch them
out to the appropriate individual web app (based on incoming host
and URL), and this central web server needs to be configured somehow.
So how do you do this in a way that leads to easy, sane deployment,
especially in a multiple user environment where not everyone can
sit there editing central configuration files as root
?
My views on this have come around to the idea that you want some
equivalent of Apache per-directory .htaccess
files in a directory
structure that directly reflects the hosts and URLs being delegated.
There are a couple of reasons for this.
First, a directory based structure creates natural visibility and
enforces single ownership of URL and host delegations. If you own
www.fred.com/some/url
, then you are in charge of that URL and
everyone underneath it. No one can screw you up by editing a big
configuration file (or a set of them) and missing that /some/url
has already been delegated to you somewhere in hundreds of lines
of configuration setup; your ownership is sitting there visible in
the filesystem and taking it over means taking over your directory,
which Unix permissions will forbid without root
's involvement.
Second, using some equivalent of .htaccess
files creates delegation
of configuration and control. Within the scope of the configuration
allowed in the .htaccess
equivalent, I don't need to involve a
sysadmin in what I do to hook up my application, control access to
it, have the native master web server handle some file serving for
me, or whatever. Of course the minimal approach is to support none
of this in the master server (with the only thing the .htaccess
equivalent can do is tell the master server how to talk to my web
app process), but I think it's useful to do more than that. If
nothing else, directly serving static files is a commonly desired
feature.
(Apache .htaccess
is massively powerful here, which makes it quite
useful and basically the gold standard of this. Many master web
servers will probably be more minimal.)
To the extent that I can get away with it, I will probably configure all of my future Apache setups this way (at least for personal sites). Unfortunately there are some things you can't configure this way in Apache, often for good reason (for example, mod_wsgi).
(This entry is inspired by a Twitter conversation with @eevee.)
Sidebar: doing this efficiently
Some people will quail at the idea of the master web server doing a whole series of directory and file lookups in the process of handling each request. I have two reactions to this. First, this whole idea is probably not appropriate for high load web servers because on high load web servers you really want and need more central control over the whole process. If your web server machine is already heavily loaded, the last thing you want to do is enable someone to automatically set up a new high-load service on it without involving the (Dev)Ops team.
Second, it's possible to optimize the whole thing via a process of registering and (re)loading configuration setups into the running web server. This creates the possibility of the on-disk configuration not reflecting the running configuration, but that's a tradeoff you pretty much need to make unless you're going to be very restrictive. In this approach you edit your directory structure and then poke the web server with some magic command so that it takes note of your change and redoes its internal routing tables.