People like file extensions whether or not they're necessary

October 28, 2022

In some circles, it's popular to denigrate file extensions as a Windows-ism that's only necessary because of (historical) limitations of that platform. However, we have a fair amount of evidence that people like file extensions even on platforms where they aren't necessary, and adopt them by choice in various circumstances even without technical need.

The obvious primary source for this evidence is people's habits on Unix. Unix doesn't need file extensions and they're by no means universally used in file names, yet they are popular in a variety of situations. The most obvious example is that Unix based programming languages very frequently have a convention of using specific file extensions on their source files. For modern programming languages you can say that this is some degree of wanting to go along with the convention and wanting to be cross-platform to Windows, but it's harder to make this view stick for older languages that predate assumptions that everything would wind up on Windows sooner or later.

(It's not just programming languages on Unix that use file extensions. For example, it's entirely a convention to put tar archives in files with a .tar extension, but it's pretty universal.)

The situation with C is especially striking, because C uses two file extensions, .c and .h, despite there being no technical requirement for this within the language. You can #include a file with a .c extension just as readily as you can a .h file (and sometimes people do), but purely by convention we use the two extensions to signal the generally intended use of the file (and then the C compiler frontends go along with this by looking at you funny if you tell them to compile a .h file). People clearly like being able to see a clear split of intended use.

Compilers for C and other languages show one use of using file extensions, namely that they let you create a bunch of different file names from a base name plus a varying extension. Starting with fred.c you can traditionally ask the C compiler to generate an assembler version, which winds up in 'fred.s', or turn it into an object file, 'fred.o'. Sometimes you'll go straight to a final executable, conventionally called just 'fred' (with no extension). And you can have 'fred.h', the visible API for the code in 'fred.c', and most everyone will understand what you mean just from seeing the two files together in a directory listing.

One potential reason for the original popularity of file extensions on Unix is that Unix shells and other Unix tools such as 'find' makes it convenient to work with file extensions if you want to do something to 'all C files' or 'all C headers' or the like. Another is that putting this information about the intended file type into the file name makes it immediately visible in directory listings and other places you see file names (which may include other files, for example Makefiles).

You can argue that all of these things should be done through file 'type' metadata (which would still let you distinguish C headers from C source code). However, the drawback of this is that people would have to learn additional sets of syntax and features for searching and operating on this file type metadata, and some things would be more awkward that they are today (where you can ask for file name patterns like 'vdev*.h').

Sidebar: The original Macintosh and its lack of file extensions

The original Macintosh operating system (ie, classic Mac OS) explicitly stored both the 'file type' and creator code of every file separately from its name (see the Wikipedia entry on resource forks). As a result, I believe that people mostly didn't use file extensions in classic Mac OS file names. I suspect that C programmers on Mac OS may still have called things 'file.c', but that was probably partly through custom and habit.

(I once did write some C code on classic Mac OS, but it was so long ago that I've long since forgotten how it worked.)

Comments on this page:

By vasi at 2022-10-29 03:26:35:

I remember back on MacOS 7, it was common for CodeWarrior projects to have the file extension .μ, the Greek letter mu. Normally extensions on MacOS existed for portability with Windows, which clearly wouldn't work with MacRoman encoded letters, so not sure what the point was.

By nobody in particular at 2023-01-05 15:07:43:

Windows's use of extensions can be considered a sort of "inherited" trait in OS evolution: encoding type information in the file name space goes back at least as far as MIT's Compatible Time Sharing System circa 1963 [1], where file names had a mandatory two-36-bit-word structure, the second of which was used for the type (both used a six-bit character encoding, so the two strings could each be up to 6 characters long). Mandatory filename structure was pretty common on "minicomputer" operating systems developed in the 60s and 70s: MIT's ITS, Stanford's WAITS, DEC's TOPS-10, TOPS-20, and VMS all did so, too. (I don't have exact phylogenies among all those systems, but I believe there was a lot of cross-fertilization.) Wikipedia says TOPS-10 was an influence on CP/M [2], and that DOS was intended to be similar to CP/M [3]; and, of course, Windows succeeded MS-DOS. So extensions can conceivably trace back to CTSS.

In fact, Unix's lack of mandatory structure within filenames (i.e., entries in directories) is also, arguably, an inherited trait; this time from Multics [4]. It seems that Unix also inherited from Multics the convention that executables' filenames omitted any suffix, though Multics appears to (eventually) have had some other naming conventions atop unstructured filenames different from those that developed on Unix [5].

Anyhow, ISTM that even where extensions have been unnecessary (like Unix) or unconventional (like classic MacOS), encoding information in a suffix is an obvious, perhaps inevitable, approach for organizing files in any workflow involving distinct file formats, e.g., source vs. binary, markdown vs. markup vs. page description, vector vs. rasterized graphics, "raw" or "original" vs. losslessly compressed vs. lossily compressed image or audio, and so forth. Even if, hypothetically, there'd never been OSes that had mandatory file name extensions, I bet people would come up with suffixing conventions, at minimum so that sorted file listings would group related files together.

(Cf. I observe that most people also will invent "versioning" conventions within filenames when they need to, e.g., "thesis_v2.doc", "Thesis Jan 05.doc", etc. Lawyers in the US sometimes encode their initials in filenames in order to indicate who last modified the file, e.g., a John Smith might name his draft "Contract Jan 05, 2023 JS.doc". These kinds of conventions are obviously not robust, but they're mostly only for humans to interpret, not usually for programs to process. All the same, I think it's interesting that some problems seem to "want" a dimension or two of organizational capability beyond just an unstructured name.)

Of course, one alternative to encoding something like file type information is to try to determine it dynamically, possibly heuristically. This can lead to fun results, e.g.,




[4] See page 3-1, PDF page 40, (This manual is from 1975, and so is strictly anachronistic when considering what Unix inherited from Multics; however, I believe its description of file names, called "entrynames" in Multics, applied to Multics prior to Bell Labs' withdrawal from the project.)

[5] Ibid, pages 3-4 through 3-10, PDF pages 43--49; and also page 4-2, PDF page 65.

Written on 28 October 2022.
« Scripts and programs should skip having extensions like '.sh' and '.bash'
Importing a Python program that doesn't have a .py extension »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Fri Oct 28 22:49:46 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.