Wandering Thoughts archives

2023-02-01

C was not created as an abstract machine (of course)

Today on the Fediverse I saw a post by @nytpu:

Reminder that the C spec specifies an abstract virtual machine; it's just that it's not an interpreted VM *in typical implementations* (i.e. not all, I know there was a JIT-ing C compiler at some point), and C was lucky enough to have contemporary CPUs and executable/library formats and operating systems(…) designed with its VM in mind

(There have also been actual C interpreters, some of which had strict adherence to the abstract semantics, cf (available online in the Usenix summer 1988 proceedings).)

This is simultaneously true and false. It's absolutely true that the semantics of formal standard C are defined in terms of an abstract (virtual) machine, instead of any physical machine. The determined refusal of the specification to tie this abstract machine in concrete CPUs is the source of a significant amount of frustration in people who would like, for example, for there to be some semantics attached to what happens when you dereference an invalid pointer. They note that actual CPUs running C code all have defined semantics, so why can't C? But, well, as is frequently said, C Is Not a Low-level Language (via) and the semantics of C don't correspond exactly to CPU semantics. So I agree with nytpu's overall sentiments, as I understand them.

However, it's absolutely false that C was merely 'lucky' that contemporary CPUs, OSes, and so on were designed with its abstract model in mind. Because the truth is the concrete C implementations came first and the standard came afterward (and I expect nytpu knows this and was making a point in their post). Although the ANSI C standardization effort did invent some things, for the most part C was what I've called a documentation standard, where people wrote down what was already happening. C was shaped by the CPUs it started on (and then somewhat shaped again by the ones it was eagerly ported to), Unix was shaped by C, and by the time that the C standard was producing drafts in the mid to late 1980s, C was shaping CPUs through the movement for performance-focused RISC CPUs (which wanted to optimize performance in significant part for Unix programs written in C, although they also cared about Fortran and so on).

(It's also not the case that C only succeeded in environments that were designed for it. In fact C succeeded in at least one OS environment that was relatively hostile to it and that wanted to be used with an entirely different language.)

Although I'm not absolutely sure, I suspect that the C standard defining it in abstract terms was in part either enabled or forced by the wide variety of environments that C already ran in by the late 1980s. Defining abstract semantics avoided the awkward issue of blessing any particular set of concrete ones, which at the time would have advantaged some people while disadvantaging others. This need for compromise between highly disparate (C) environments is what brought us charming things like trigraphs and a decision not to require two's-complement integer semantics (it's been proposed to change this, and trigraphs are gone in C23, also).

Dating from when ANSI C was defined and C compilers became increasingly aggressive about optimizing around 'undefined behavior' (even if this created security holes), you could say that modern software and probably CPUs has been shaped by the abstract C machine. Obviously, software increasingly has to avoid doing things that will blow your foot off in the model of the C abstract machine, because your C compiler will probably arrange to blow your foot off in practice on your concrete CPU. Meanwhile, things that aren't allowed by the abstract machine are probably not generated very much by actual C compilers, and things that aren't generated by C compilers don't get as much love from CPU architects as things that do.

(This neat picture is complicated by the awkward fact that many CPUs probably runs significantly more C++ code than true C code, since so many significant programs are written in the former instead of the latter.)

It's my view that recognizing that C comes from running on concrete CPUs and was strongly shaped by concrete environments (OS, executable and library formats, etc) matters for understanding the group of C users who are unhappy with aggressively optimizing C compilers that follow the letter of the C standard and its abstract machine. Those origins of C were there first, and it's not irrational for people used to them to feel upset when the C abstract machine creates a security vulnerability in their previously working software because the compiler is very clever. The C abstract machine is not a carefully invented thing that people then built implementations of, an end in and of itself; it started out as a neutral explanation and justification of how actual existing C things behaved, a means to an end.

CAsAbstractMachine written at 23:18:30; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.