Why file extensions in URLs are a hackWhen a web server answers your request for a URL, you get both
the contents of the URL and its type; ie, whether you are getting
HTML, plain text, a PNG image, a PDF, and so on. Specifically, this
type information is returned in the form of a MIME type in the
(Some web browsers then sniff the data themselves and second guess the web server, generally with ruinous results.) Now, the web server has to get this type information from somewhere. Ideally (at least for web servers) all files would have metadata attached to them, including their type, and the web server could just use this directly. However, the world is not like that, especially on Unix (the home of the first web servers); files had contents and that was it. There are a number of plausible ways around this; for example, you could
have a file (or a bunch of files) that mapped URLs (or filenames) to
content types, or the web server itself could sniff the contents of
files to work out their type. But early web servers took the simple way
out: they just declared that if filenames had certain extensions, they
were certain content types. If your filename ended in However, this all is nothing but a hack (a useful hack, admittedly)
to make up for the lack of real type metadata about files. If Unix
filesystems had had content type metadata in 1992 or so, we would
probably find the idea of a One corollary is that this is in no way required; a webserver can send you any content type with any URL extension, or with none. Thus, web browsers and spiders that make content type decisions based on the URL extension are wrong and broken. (However, people have expectations and will probably get confused
and irritated if your Sidebar: about
|
These are my WanderingThoughts GettingAround This is part of CSpace, and is written by ChrisSiebenmann. * * * Atom feeds are available; see the bottom of most pages. Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web |