Web URL paths don't quite map cleanly onto the abstract 'filesystem API'

June 4, 2022

Generally, the path portion of web URLs maps more or less on to the idea of a hierarchical filesystem, partly because the early web was designed with that in mind. However, in thinking about this I've realized that there is one place where paths are actually a superset of the broad filesystem API; in fact this place actually causes some amount of heartburn and different design decisions in web servers when they serve static files.

The area of divergence is that in the general filesystem API, directories don't have contents, just children. Only files have contents. In web paths, of course, directories very frequently have contents as well as children (if anything, a web path directory that refuses to have contents is rarer than one that does). This is quite convenient for people using the web, but requires web servers to invent a convention for how path directories get their contents (for example, the 'index.html' convention).

(There's no fundamental reason why filesystem directories couldn't have contents as well as children; they just don't. And there are other environments with hierarchical namespaces where people not infrequently would like 'directories' with contents; one example is IMAP.)

One possible reason for this decision in web paths (other than user convenience) is the problem that the root of a web site would otherwise present. The root of a web site almost always has children (otherwise it's a very sparse site), so it must be a directory. If web directories had no contents in the way of filesystem directories, either the web root would have to be special somehow or people would have a bad experience visiting 'http://example.org/'.

(This bad experience would probably drive browsers to assume a convention for the real starting page of web sites, such as automatically trying '/index.html'.)

PS: Another reason for the 'decision' is that any specification would have to go out of its way to say that directories in web paths couldn't have contents and should return some error code if you requested them. Not saying anything special about requesting directories is easier.

Written on 04 June 2022.
« Regular expressions are effectively a (hard) programming language
Checking a few metrics (time series) at once in Prometheus's query language »

Page tools: View Source, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat Jun 4 21:13:17 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.