A bit on compilation's changing number of stages (and assembly)

December 13, 2021

Recently I was skimming Everything You Never Wanted To Know About Linker Script (via). One of the things it mentions is the traditional three stage model for transforming your source code into an executable program, where the compiler produces assembly code that it hands to the assembler, the assembler produces object code, and then the linker turns the object code into an executable. This is the historical practice and provides a nice simple mental model, but it's not necessarily how it works today. Various modern compilation environments skip right over the assembly language stage and have the compiler generate object code.

I don't know where the traditional three stage model originates, but one place I definitely encountered it was in Unix, where this was the normal way that Unix (classical) C compilers worked. The C compiler wrote out assembler files, then the cc frontend called the as assembler with them, then ran the ld linker to produce your final program. Although I don't know the reasons that compilers traditionally worked this way, it has a variety of appealing properties. You need an assembler anyway and producing binary object code is generally more complicated and annoying than producing textual assembly, so the compiler might as well delegate it to a tool that's already there.

(Early C compilers had a fourth stage, the preprocessor, that ran before compilation. For various reasons, C compilers increasingly moved this into the main C compiler instead of having it as a separate program.)

The Plan 9 compilers were probably where I first heard of doing away with the assembler, although they definitely weren't the first to do it (I don't think Turbo Pascal had anything like a separate assembler stage, for example). As the story I heard goes, the Bell Labs people noticed that their C compiler was spending a surprising amount of time turning an internal binary representation of assembly code into a textual assembler source file, only to have the assembler spend a bunch of time turning the textual assembler source file back into another, closely related internal binary representation. So the Bell Labs people decided to cut out this work, having their compilers normally write out a binary format that the linker could directly consume.

The speed increase from avoiding this round trip of encoding to text and decoding from text again is the big reason that modern compilers may skip having a separate assembler stage. In some environments with captive intermediate object formats, the compiler may be in a position to give the linker more freedom in final instruction selection through providing it more information about what the compiler wants.

(Traditionally the linker had to assume that everything the assembler handed it was sacred, and the assembler itself was deliberately quite literal.)

PS: Even when there were separate stages, they didn't necessarily write their output to disk. Gcc has had the -pipe option for a long time, and back when disks were very slow it used to be quite popular (if you had enough memory).

PPS: 'gcc -v' gives you really quite verbose output, in which it's hard to see what commands the compiler frontend is actually running. Clang is somewhat less verbose and easy to follow. Modern gcc (on Linux) still seems to run an assembler, but clang doesn't seem to.

Sidebar: The question of what is a stage of compilation

What constitutes a distinct and separate 'stage' of turning source code into an executable is a somewhat unclear thing, with no useful formal definition. I was going to say that using a separate executable makes a difference, but then when I looked I discovered that the V7 C compiler actually was three executables (and see also the makefile in c/), not even counting the separate C preprocessor. I don't think many people would consider compiling C programs on V7 to have that many stages.

Written on 13 December 2021.
« How Vim's visual mode has wound up being useful for me
Finding Ubuntu (and Debian) packages that are in odd states »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Dec 13 22:20:29 2021
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.