Wandering Thoughts archives


The case for atomic types in programming languages

In My review of the C standard library in practice Chris Wellons says, as part of the overall discussion:

I’ve used the _Atomic qualifier in examples since it helps with conciseness, but I hardly use it in practice. In part because it has the inconvenient effect of bleeding into APIs and ABIs. As with volatile, C is using the type system to indirectly achieve a goal. Types are not atomic, loads and stores are atomic. [..]

Wellons is entirely correct here; at the CPU level (on current general purpose CPUs), specific operations are atomic, not specific storage locations. An atomic type is essentially a lie by the language. One language that originally embraced this reality is Go, which originally had no atomic types, only atomic access. On the other side is Rust; Rust has atomic types and a general discussion of atomic operation.

(In Go 1.19, Go added a number of atomic types, so now Go can be used with either approach.)

However, I feel that that the lie of atomic types is a genuine improvement in almost all cases, because of the increase in usability and safety. The problem with only having atomic operations is the same as with optional error checking; you have to remember to always use them, even if the types you're operating on can be used with ordinary operations. As we all know, people can forget this, or they can think that they're clever enough to use non-atomic operations in this one special circumstance that is surely harmless.

Like forced error handling (whether through exceptions or option/result types), having atomic types means that you don't have a choice. The language makes sure that they're always used safely, and you as the programmer are relieved of one more thing to worry about and try to keep track of. Atomic types may be a bit of a lie, but they're an ergonomic lie that improves safety.

The question of whether atomic types should be separate things (as in Rust) or be a qualifier on regular types (as in C) is something that you can argue over. It's clear that atomic types need extra operations because there are important atomic operations (like compare and swap) that have no non-atomic equivalent. I tend to think that atomic types should be their own thing because there are many standard operations that they can't properly support, at least not without potentially transforming simple code that you wrote into something much more complex. It's better to be honest about what atomic types can and can't do.

Sidebar: Why an atomic type qualifier has to bleed into APIs and ABIs

A type qualifier like C's _Atomic is a promise from the compiler to you that all (supported) operations on the object will be performed atomically. If you remove this qualifier as part of passing a variable around, you're removing this promise and now your atomic qualified thing might be accessed non-atomically. In other words, the atomic access is part of the API. It's not even necessarily safe to automatically promote a non-atomic object into being an atomic object as part of, say, a function call, because the code being called may reasonably assume that all access to the object is atomic.

CaseForAtomicTypes written at 23:05:49; Add Comment


C was not created as an abstract machine (of course)

Today on the Fediverse I saw a post by @nytpu:

Reminder that the C spec specifies an abstract virtual machine; it's just that it's not an interpreted VM *in typical implementations* (i.e. not all, I know there was a JIT-ing C compiler at some point), and C was lucky enough to have contemporary CPUs and executable/library formats and operating systems(…) designed with its VM in mind

(There have also been actual C interpreters, some of which had strict adherence to the abstract semantics, cf (available online in the Usenix summer 1988 proceedings).)

This is simultaneously true and false. It's absolutely true that the semantics of formal standard C are defined in terms of an abstract (virtual) machine, instead of any physical machine. The determined refusal of the specification to tie this abstract machine in concrete CPUs is the source of a significant amount of frustration in people who would like, for example, for there to be some semantics attached to what happens when you dereference an invalid pointer. They note that actual CPUs running C code all have defined semantics, so why can't C? But, well, as is frequently said, C Is Not a Low-level Language (via) and the semantics of C don't correspond exactly to CPU semantics. So I agree with nytpu's overall sentiments, as I understand them.

However, it's absolutely false that C was merely 'lucky' that contemporary CPUs, OSes, and so on were designed with its abstract model in mind. Because the truth is the concrete C implementations came first and the standard came afterward (and I expect nytpu knows this and was making a point in their post). Although the ANSI C standardization effort did invent some things, for the most part C was what I've called a documentation standard, where people wrote down what was already happening. C was shaped by the CPUs it started on (and then somewhat shaped again by the ones it was eagerly ported to), Unix was shaped by C, and by the time that the C standard was producing drafts in the mid to late 1980s, C was shaping CPUs through the movement for performance-focused RISC CPUs (which wanted to optimize performance in significant part for Unix programs written in C, although they also cared about Fortran and so on).

(It's also not the case that C only succeeded in environments that were designed for it. In fact C succeeded in at least one OS environment that was relatively hostile to it and that wanted to be used with an entirely different language.)

Although I'm not absolutely sure, I suspect that the C standard defining it in abstract terms was in part either enabled or forced by the wide variety of environments that C already ran in by the late 1980s. Defining abstract semantics avoided the awkward issue of blessing any particular set of concrete ones, which at the time would have advantaged some people while disadvantaging others. This need for compromise between highly disparate (C) environments is what brought us charming things like trigraphs and a decision not to require two's-complement integer semantics (it's been proposed to change this, and trigraphs are gone in C23, also).

Dating from when ANSI C was defined and C compilers became increasingly aggressive about optimizing around 'undefined behavior' (even if this created security holes), you could say that modern software and probably CPUs has been shaped by the abstract C machine. Obviously, software increasingly has to avoid doing things that will blow your foot off in the model of the C abstract machine, because your C compiler will probably arrange to blow your foot off in practice on your concrete CPU. Meanwhile, things that aren't allowed by the abstract machine are probably not generated very much by actual C compilers, and things that aren't generated by C compilers don't get as much love from CPU architects as things that do.

(This neat picture is complicated by the awkward fact that many CPUs probably runs significantly more C++ code than true C code, since so many significant programs are written in the former instead of the latter.)

It's my view that recognizing that C comes from running on concrete CPUs and was strongly shaped by concrete environments (OS, executable and library formats, etc) matters for understanding the group of C users who are unhappy with aggressively optimizing C compilers that follow the letter of the C standard and its abstract machine. Those origins of C were there first, and it's not irrational for people used to them to feel upset when the C abstract machine creates a security vulnerability in their previously working software because the compiler is very clever. The C abstract machine is not a carefully invented thing that people then built implementations of, an end in and of itself; it started out as a neutral explanation and justification of how actual existing C things behaved, a means to an end.

CAsAbstractMachine written at 23:18:30; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.