Wandering Thoughts archives

2018-03-04

The value locked up in the Unix API makes it pretty durable

Every so often someone proposes or muses about replacing Unix with something more modern and better, or is surprised when new surface OSes (such as ChromeOS) are based on Unix (often Linux, although not always). One reason that this keeps happening and that some form of Unix is probably going to be with us for decades to come is that there is a huge amount of value locked up in the Unix API, and in more ways than are perhaps obvious.

The obvious way that a great deal of value is locked up in the Unix API is the kernels themselves. Whether you look at Linux, FreeBSD, OpenBSD, or even one of the remaining commercial Unixes, all of their kernels represent decades of developer effort. Some of this effort is in the drivers, many of which you could do without in an OS written from scratch for relatively specific hardware, but a decent amount of the effort is in core systems like physical and virtual memory management, process handling, interprocess communication, filesystems and block level IO handling, modern networking, and so on.

However, this is just the tip of the iceberg. The bigger value of the Unix API is in everything that runs on top of it. This comes in at least two parts. The first part is all of the user level components that are involved to boot and run Unix and everything that supports them, especially if you include the core of a graphical environment (such as some form of display server). The second part is all of the stuff that you run on your Unix as its real purpose for existing, whether this is Apache (or some other web server), a database engine, your own custom programs (possibly written in Python or Ruby or whatever), and so on. It's also the support programs for this, which blur the lines between the 'system' and being productive with it; a mailer, a nice shell, an IMAP server, and so on. Then you can add an extra layer of programs used to monitor and diagnose the system and another set of programs if you develop on it or even just edit files. And if you want to use the system as a graphical desktop there is an additional stack of components and programs that all use aspects of the Unix API either directly or indirectly.

All of these programs represent decades or perhaps centuries of accumulated developer effort. Throwing away the Unix API in favour of something else means either doing without these programs, rewriting your own versions from scratch, or porting them and everything they depend on to your new API. Very few people can afford to even think about this, much less undertake it for a large scale environment such as a desktop. Even server environments are relatively complex and multi-layered in practice.

(Worse, some of the Unix API is implicit instead of being explicitly visible in things like system calls. Many programs will expect a 'Unix' to handle process scheduling, memory management, TCP networking, and a number of other things in pretty much the same way that current Unixes do. If your new non-Unix has the necessary system calls but behaves significantly differently here, programs may run but not perform very well, or even malfunction.)

Also, remember that the practical Unix API is a lot more than system calls. Something like Apache or Firefox pretty much requires a large amount of the broad Unix API, not just the core system calls and C library, and as a result you can't get them up on your new system just by implementing a relatively small and confined compatibility layer. (That's been tried in the past and pretty much failed in practice, and is one reason why people almost never write programs to strict POSIX and nothing more.)

(This elaborates on a tweet of mine that has some additional concrete things that you'd be reimplementing in your non-Unix.)

UnixAPIDurableValue written at 18:51:44; Add Comment

The practical Unix API is more than system calls (or POSIX)

What is the 'Unix API'? Some people would be tempted to say that this is straightforward; depending on your perspective it's either the relatively standard set of core Unix system calls or the system calls and library functions required by POSIX. This answer is not wrong at one level, but in practice it is not a useful one.

As people have found out in the past, the real Unix API is the whole collection of behaviors and environments that Unix programs assume. It isn't just POSIX library calls; it's also the shell and standard utilities and files that are in known locations and standard capabilities and various other things. A 'Unix' without a useful $HOME environment variable and /tmp may be specification compliant (I haven't checked POSIX) but it's not useful, in that many programs that people want generally won't run on it.

In practice the Unix API is the entire Unix environment. What constitutes the 'Unix' environment instead of the environment specific to a particular flavour of Unix is an ever-evolving topic. Once upon a time mmap() was not part of the Unix environment (cf); today it absolutely is. I'm pretty certain that once upon a time the -o flag to egrep was effectively Linux specific (as it relied on egrep being GNU grep); today it's much closer to being part of Unix, as many Unixes either have GNU grep as egrep or have added support for -o. And so it goes, with the overall Unix API moving forward through de facto evolution.

Unless you intend for your program to be narrowly and specifically portable to POSIX or an even more minimal standard, it is not a bug for it to rely on portions of the broader, de facto Unix API. It's not even necessarily a bug to rely on APIs that are only there on some Unixes (for example Linux and FreeBSD), although it may limit how widely your program spreads. Even somewhat narrow API choices are not necessarily bugs; you may have decided to be limited in your portability or to at least require some common things to be available.

(The Go build process requires Bash on Unix, for example, although it doesn't require that /bin/sh is Bash.)

PS: This is a broader sense of the 'Unix API' (and a different usage) than I used when I wrote about whether the C runtime and library was a legitimate part of the Unix API. The broad Unix API is and always has been layered, and things like Go are deliberately implementing their own API on top of one of the lower layers. In a way, my earlier entry was partly about how separate the layers of the broad Unix API have to be; for example, can you implement a compatible and fully capable Bourne shell using only public Unix kernel APIs, or at most public C library APIs?

(Many people would say that a system where you could not do this was not really 'Unix', even if it complied with POSIX standards.)

UnixAPIMoreThanSyscalls written at 01:03:15; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.