Why file extensions in URLs are a hack
When a web server answers your request for a URL, you get both
the contents of the URL and its type; ie, whether you are getting
HTML, plain text, a PNG image, a PDF, and so on. Specifically, this
type information is returned in the form of a MIME type in the
Content-Type:
HTTP reply header.
(Some web browsers then sniff the data themselves and second guess the web server, generally with ruinous results.)
Now, the web server has to get this type information from somewhere. Ideally (at least for web servers) all files would have metadata attached to them, including their type, and the web server could just use this directly. However, the world is not like that, especially on Unix (the home of the first web servers); files had contents and that was it.
There are a number of plausible ways around this; for example, you could
have a file (or a bunch of files) that mapped URLs (or filenames) to
content types, or the web server itself could sniff the contents of
files to work out their type. But early web servers took the simple way
out: they just declared that if filenames had certain extensions, they
were certain content types. If your filename ended in .html
it would
be served as HTML, if it ended in .gif
it would be served as a GIF,
and so on.
However, this all is nothing but a hack (a useful hack, admittedly)
to make up for the lack of real type metadata about files. If Unix
filesystems had had content type metadata in 1992 or so, we would
probably find the idea of a .html
at the end of many of our web URLs
to be laughable.
One corollary is that this is in no way required; a webserver can send you any content type with any URL extension, or with none. Thus, web browsers and spiders that make content type decisions based on the URL extension are wrong and broken.
(However, people have expectations and will probably get confused
and irritated if your .html
URLs are, say, PDFs.)
Sidebar: about .php
and .aspx
and so on
Generically, web servers don't just need to know what content type to label data as, they need to know how to process a file in general when it is requested; are they supposed to just send the file out as data, or do they need to do something more complicated with it?
Since web servers didn't have better metadata, they used file extensions
as convenient way to control this too, and so they grew the knowledge
that .php
files should not be sent out as data but instead handed to
the PHP module to be interpreted and so on.
(This is inconsistently handled in Apache, since there are also ways to
say that all files in certain areas are to be executed as programs, not
used as normal content. The advantage of the .php
approach is that you
can freely mix special .php
files and regular content files.)
|
|