2020-12-28
It feels like the broad Unix API is being used less these days
A few years ago I wrote about how the practical Unix API is broader than system calls and how the value locked up in the broad Unix API made it pretty durable. I still believe that in one way, but at the same time I've wound up feeling that a lot of modern software development and deployment practices are causing the broad Unix API to be less and less used and useful. What I'm specifically thinking about here is containers.
If you're logging in to a Unix machine and using it, elements of
the broad Unix API like $HOME
and /tmp
matter to you. But for
a container (or for deploying a container), they often don't.
Containers deliberately ask much less of the host than the broad
Unix API (that's one of their features), and to the extent that software
inside a container uses the broad API, it's using a sham version
that was custom assembled for it. My impression is that some of
this shift is social, in attitudes about how container-ized software
should be put together and what it should use and assume. To put
it one way, I don't think it would be seen as a good thing to use
a bunch of shell scripts in a container. Containers aren't general
purpose Unix systems and people don't write software for them as if
they were.
Right now I don't think this is a significant force in the parts of the broad Unix world that I notice, one big enough to be changing Unix as a whole. There are plenty of people still running and deploying traditional Unix systems (including us), and then putting software straight onto such systems (without containers). These people are all using the broad Unix API and exerting a quiet pressure on software to still support (and use) it, instead of requiring containers or at least some emulation of them (although you can find software that really doesn't want to be deployed 'simply', ie outside a container).
One part of this is likely that Unix remains more than Linux, although not everyone really believes this. Right now containers are fairly strongly tied to Linux for various reasons, so if you write container-only software you're implicitly writing Linux only software. My impression is that many open source projects aren't willing to tie themselves down like this.
Of course, there's also a lot of Unix software that isn't the sort of thing you put in containers in the first place, or at least not in conventional containers (Linux has Flatpaks and Snaps for more interactive applications, but they're not very popular). This software is using the broad Unix API when it arranges to install manpages, support files, and so on in the standard locations. It can also sometimes take advantage of standard services and standard integrations with other software (for example Certbot and other Let's Encrypt automation, which cooperate with various daemons to give them TLS certificates).
A little puzzle with printf()
and C argument passing
In The Easy Ones – Three Bugs Hiding in the Open, Bruce Dawson gave us a little C puzzle in passing:
The variable arguments in printf formatting means that it is easy to get type mismatches. The practical results vary considerably:
- printf(“0x%08lx”, p); // Printing a pointer as an int – truncation or worse on 64-bit
- printf(“%d, %f”, f, i); // Swapping float and int – could print nonsense, or might actually work (!)
- printf(“%s %d”, i, s); // Swapping the order of string and int – will probably crash
[...] (aside: understanding why #2 often prints the desired result is a good ABI puzzle)
I had to think about this for a bit, and then I realized why and how it can work (and why similar integer versus float argument confusion can also work for other functions, even ones with fixed argument lists). What it comes down to is that in some ABIs, arguments are passed in registers (at least early arguments, before you run out of registers), and floating point arguments are passed in different registers than integers (and pointers). This is true even for functions that take variable arguments and will walk through them using stdarg macros (or at least it can be, depending on the ABI).
Because floating point and non floating point arguments are passed in different sets of registers, what matters isn't the total order of arguments but the order of floating point or non-fp arguments. So here, regardless of where '%f' is in the printf format, it always causes printf() to get the first floating point argument, which can never be confused with an integer argument. Similarly, the first '%d' causes printf() to look for the second non-fp argument, regardless of where it was in the argument order; it could be at the end of several floating point arguments and still work.
(The '%d' makes printf() look for the second non-fp argument because the first one was the format string. In an ABI that passed pointers in a separate place than integers, it would still work out, since now the first '%d' would be looking for the first integer argument.)
Using the excellent services of godbolt.org, we can see this in
action on 64-bit x86 in a very small example (I used a very small example and a
decent optimization level to get clear, minimal assembly code). The
floating point argument is passed in xmm0
, while the format string
and the integer argument are passed in edi
and esi
respectively
(I don't know what eax
is doing, but it probably has something
to do with the ABI). A similar thing happens on 64-bit ARM v8 (aka
Aarch64), as we can also see on godbolt with the same example on
Aarch64.
(Based on this page,
the Aarch64 x0
and w1
are in the same set of registers. Apparently
d0
is a 64-bit version of the first floating point register, from
here
[pdf]. I wound up looking up all of this to be sure I understood
what was going on in the Aarch64 call, so I might as well write it
down here.)
Since pointers and integers are normally passed in the same set of
registers (at least on 64-bit x86 and Aarch64), we can also see why
the third example is very likely to fail. Since the same set of
registers is used for both argument types, it's possible to use an
integer argument as a pointer argument, with a segmentation fault
as the likely result. Similarly, we can predict that 'printf("%s
%f", f, s);
' might well work.
PS: This confusion can happen in any language that follows the C ABI on a platform with this sort of split usage of registers (although many languages may prevent this sort of argument type confusion). Not all languages do; famously, Go currently passes all arguments on the stack (as of Go 1.15 and soon Go 1.16).