The web is, in a sense, designed for serving static files

May 14, 2022

One thought I've been turning over in my mind lately is the idea that one of the reasons that it's historically been so easy to serve static files on the web is that in a sense, the web was designed for it. This was not so much through calculation as through necessity, because most or all of the early web servers served static files, especially the very first web server. This likely created a strong pressure to make HTTP friendly to static files, since anything else would require a more complicated web server for no practical gain.

(Or they almost entirely served static files. There is a provision in the very first version of HTTP for searching an index, with a presumably dynamic result generated by a program.)

The obvious area where this shows is that URL paths map directly on to Unix file paths. When I write it this way using web terms it sounds natural, but in fact the 'URL path' is really a per-server identifier for the particular page. There are a lot of ways to index and address content other than hierarchical paths, but the web picked paths instead of some other identifier, despite them not always being a good way to identify content in practice (just ask the people who've had to decide on what the proper URL structure is for a REST application).

(There are even other plausible textual representations, although paths are in some sense the most concise one. I prefer not to think about a version of the web that used something like X.5xx Distinguished Names to identify pages.)

The very first HTTP protocol is startlingly limited, although Basic HTTP followed very soon afterward with niceties like being able to return content types other than HTML. But it does have the idea of dynamic content in the form of searching on a named index, which feels explicitly designed for the index to be a program that performs the search and returns (HTML) results.

Early HTTP and HTML is so minimal that I'm not sure I could point to anything else that's biased in favor of static files. Arguably the biases are in what's left out; for example, there's no indicator to client programs that what they've fetched should be refreshed every so often. Early HTTP left that up to the users (and then Netscape added it around 1995, so people did eventually want this).

It's entirely possible that my thought here is well off the mark. Certainly if you're on a Unix workstation and designing a protocol that serves multiple pieces of content that are distinguished somehow, it's natural to use paths to specify content. As support for this naturalness, consider Gopher, which also used "a file-like hierarchical arrangement that would be familiar to users" despite having significantly different navigation.


Comments on this page:

I think a bigger hint of all this is that it took 4 years for form elements to be added to HTML. The <img> tag was invented a year sooner, even, despite presentation being essentially ignored in the original HTML. The original web was purely document- and content-centric.

All early HTTP servers were basically entirely oriented around saving static files, configurable but not internally extensible by arbitrary application code. And what for, if everything is a static HTML page will almost no layout?

CGI was the only way to dynamicize it all a bit – and that design follows the same mindset of having users put their files somewhere in the filesystem for the HTTP server to serve, just that for some of them it would run them and return their output instead of returning the literal content of the file.

It took quite a long time for this model to invert.

(One of the upshots of the static-first model was that people creating web apps tended to see the path part of the URL as something their hosting situation forced upon them, and not care about it very much, instead addressing all of their non-static functionality through (potentially lots of) query parameters appended to just one or a few CGIs (or, later, similar put-your-code-among-the-static-files models like PHP and ASP). Amazon was an oft-cited example of doing this. A whole clean URL movement was necessary to overcome this.)

It makes perfect sense, when we consider the WWW is bastardized hypertext flavoured by UNIX, and as intellectually-stimulating as a eunuch is fertile. The WWW is the UNIX of the Internet.

The obvious area where this shows is that URL paths map directly on to Unix file paths.

Yes, why remove a poor abstraction when it can instead be made to be an anti-abstraction?

When I write it this way using web terms it sounds natural

I'm sure it does.

As support for this naturalness, consider Gopher, which also used "a file-like hierarchical arrangement that would be familiar to users" despite having significantly different navigation.

Gopher uses opaque selector strings as references. The only base type able to hold references is a menu. This means, as an example, only the menues of a Gopher hole need to be traversed to build a picture of it, to see the leaves without hitting them. The WWW only requires a few orders of magnitude more resources for exactly the same basic thing.

Written on 14 May 2022.
« The cause of an odd DNF/RPM error about conflicting files
The idea of hierarchical filesystems doesn't feel like an API to me »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat May 14 22:43:06 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.