Wandering Thoughts archives


Lurking complexities in a web server that just serves static files

A web server that serves a directory tree of static files seems like a simple thing (if you ignore TLS and the inherent complexity in HTTP), definitely simpler than a dynamic system and often something you could whip up in short order given a HTTP handling library. In practice there are a number of additional complexities in handling static directory trees that will rapidly complicate your simple little server.

The essentially mandatory complexity is setting a good Content-Type so that browsers are willing to interpret your HTML and CSS as HTML and CSS. In a dynamic site, this is something the application handles; in a static site, you have a blob of bytes and now you get to figure it out. The traditional answer is to use file name extensions along with a small or big mapping table from extensions to Content-Type, but you're responsible for that code and that table (either hard-coded or with a way to configure it at runtime and an initial default version).

(Some filesystems allow you to associate additional arbitrary data with files, which you can use to make the user set the Content-Type for each file. But this is neither universal nor popular with people who just want a 'simple server' to serve their directory tree.)

A common quality of implementation issue is to let people set up some sort of content that will be provided on requests for a specific directory, like '//site/directory/'. You could refuse to serve directories entirely, but people won't be happy because the result will be harder to use and have more annoying URLs. In theory everyone could use '//site/directory/index.html', but in practice they want the web server to handle it. Either you redirect '//site/directory/' to the index.html version, or you automatically serve index.html when asked for it. Often people will want you to check multiple names instead of hard-coding index.html and perhaps be able to configure it.

(People can also want automatic indexes of directories, but this is more of a luxury feature. If you do implement it, some people will want to turn it off and you can wind up with it being quite complicated.)

Once you're doing anything with directories, another quality of implementation issue that people have strong feelings about is that if there's a request for '//your/directory', with no trailing slash, it should be redirected to '//your/directory/', with the trailing slash. Combined together with some sort of index handling for directories, your static server is going to wind up with a chunk of code that would otherwise be dumped on the dynamic application.

Often people will want you to support conditional GET for their static files. This requires generating at least a Last-Modified header and ideally an ETag as well (and then using them to compare with various fields in the incoming HTTP request). As with the Content-Type, the normal static directory tree 'API' doesn't provide you with specific additional metadata for this, although if you want you can often use some of the metadata from the filesystem. Or you can choose to be more thorough for ETag, for example by computing a cryptographic hash for the file.

(Strong validation is required for certain HTTP conditional requests, but you may not want to support those cases so you can always use weak validation and thus ETag values that are inexact but inexpensive to compute. Or you can use potentially inexact ETag values that you feel are good enough and shrug about potential obscure problems in, for example, resumed downloads.)

web/StaticServingComplexity written at 22:02:51; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.