The Unix C library API can only be reliably used from C

December 26, 2019

To fully implement system call origin verification, OpenBSD would like Go to make system calls through the C library instead of directly making system calls from its own runtime (which it has some reasons for doing). On the surface, this sounds like only a moderately small issue; sure, it's a bit awkward, but a language like Go should be able to just make calls to the usual C library functions like open() (and using the C calling ABI). Unfortunately it's not that simple, because very often parts of the normal C library API are actually implemented in the C preprocessor. Because of this, the C library API cannot be reliably and generally used without actually writing your own C glue code.

This sounds extreme, so let me illustrate it with everyone's favorite case of errno, which you consult to get the error value from a failed system call (and from some failed library calls). As covered in yesterday's entry, in the modern world errno must be implemented so that different threads can have different values for it, because they may be making different system calls at the same time. This requires thread local storage, and generally thread local storage cannot be accessed as a plain variable; it must be accessed through some special tricks supported by the C runtime. So here are the definitions of 'errno' from OpenBSD 6.6 and a current Fedora Linux with glibc:

/* OpenBSD */
int *__errno(void);
#define errno (*__errno())

/* Fedora glibc */
extern int *__errno_location (void) __THROW __attribute_const__;
# define errno (*__errno_location ())

In both of these cases, errno is actually a preprocessor definition. The definitions refer to non-public and undocumented C library functions (that's what the leading double underscores signal) that are not part of the public API. If you compile C code against this errno API (by including errno.h in your program), it will work, but that's the only officially supported way of doing it. There is no useful errno variable to load in your own language's runtime after a call to, say, the open() function, and if you call __errno or ____errno_location in your runtime, you are using a non-public API and it could break tomorrow (although it probably won't). To build a reliable language runtime that sticks to the public C library API, it's not enough to just call exported functions like open(); you also need to write and compile your own little C function that just returns errno to your runtime.

(There may be other important cases besides errno; I will leave them to interested parties to find.)

This is not a new issue in Unix, of course. From the beginning of stdio in V7, some of the stdio 'functions' were implemented as preprocessor macros in stdio.h. But for a long time, people didn't insist that the C library was the only officially supported way of making system calls, so you could bypass things like the whole modern errno mess unless you needed to be compatible with C code for some reason.

(Before threading came into the Unix picture, errno was a plain variable and a generally good interface, although not perfect.)

Comments on this page:

if you call __errno or ____errno_location in your runtime, you are using a non-public API and it could break tomorrow

On any platform with a stable ABI, it really couldn't break at any point without also breaking existing C binaries that depend on the same implementation. Though illumos, say, may not document those private functions today, we must preserve the existing behaviour of those symbols effectively forever or we've destroyed binary compatibility. It would seem as safe for Rust or Go to depend on their current behaviour as it would any C binary.

By cks at 2019-12-27 15:24:04:

I agree; if the ABI is stable, not just the API, you can interpret the C headers once and then safely use the result to duplicate what the preprocessor is doing. However some Unixes don't promise a stable ABI (at least OpenBSD, I believe), and you still have to interpret the C headers once in some way (either reading them or compiling test programs to see what they actually do).

(On Linux there is also the issue of different C libraries, where the preprocessor issue means they may have different ABIs while having the same API.)

Also, even when an ABI is stable over the short term it may not be over the long term, say ten or twenty years. This means that non-C implementations need to keep an eye out for changes in the preferred ABI (the one that current C programs would get when they're compiled) and keep up with it so that they won't be caught flat-footed if the Unix eventually removes support for the old ABI.

By jonys at 2020-01-03 02:40:35:

Unfortunately it's not that simple, because very often parts of the normal C library API are actually implemented in the C preprocessor. Because of this, the C library API cannot be reliably and generally used without actually writing your own C glue code.

This is true for global variables, constants and type definitions, but how is it with the API functions themselves? POSIX specifies that functions have to be available as “actual functions”, so you can #undef the name, take their address etc. even when they would normally be shadowed by equivalent macro definitions (section 2.1.1 of POSIX.1-2017). Very similar language is found in the N2176 draft of the C18 standard, section 7.1.4.

Especially the sentence “Provided that a function can be declared without reference to any type defined in a header, it is also permissible to declare the function explicitly and use it without including its associated header.” seems to imply that these API functions should be directly available for linking under the stated names, and should therefore be usable from other languages as well. But the linking process seems to be generally underspecified everywhere I looked, so maybe I'm wrong?

By cks at 2020-01-03 16:15:48:

This is true for functions specified in POSIX, but not everything in a system's C library API is specified in POSIX and so C library people might not feel bound by POSIX for extra functions they add (hopefully they will, though). One obvious and relevant source of non-POSIX things is additional system calls, although they'll probably be implemented as functions.

(A perverse implementation could have general 'syscallN' functions and then implement a lot of new system calls as #defines that use them. But that would lose various bits of type safety, among other issues.)

Actually making calls into the C library ABI from another language requires knowing the ABI as well as the API, so you can pass arguments correctly, get return values, and so on. This should lock down the linking process as well. If a platform refuses to document the C ABI to that level, you're in trouble no matter what because you can't even call your own C shim functions at that point.

(Technically a very perverse Unix could document the ABI for calling C functions in your own program code but refuse to have a documented ABI for linking against shared libraries and calling into them. This would let you call your shim functions but not library ones.)

Written on 26 December 2019.
« Some reasons for Go to not make system calls through the standard C library
Our setup of Prometheus and Grafana (as of the end of 2019) »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Dec 26 23:50:15 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.