2017-12-31
Is the C runtime and library a legitimate part of the Unix API?
One of the knocks against Go is, to quote from Debugging an evil Go runtime bug (partly via):
Go also happens to have a (rather insane, in my opinion) policy of reinventing its own standard library, so it does not use any of the standard Linux glibc code to call vDSO, but rather rolls its own calls (and syscalls too).
Ordinary non-C languages on Unixes generally implement a great many
low level operations by calling into the standard C library. This
starts with things like making system calls, but also includes
operations such as getaddrinfo(3)
. Go doesn't do this; it implements
as much as possible itself, going straight down to direct system
calls in assembly language. Occasionally there are problems that
ensue.
A few Unixes explicitly say that the standard C library is the
stable API and point of interface with the system; one example is
Solaris (and now Illumos). Although they don't casually change the
low level system call implementation, as far as I know Illumos
officially reserves the right to change all of their actual system
calls around, breaking any user space code that isn't dynamically
linked to libc
. If your code breaks, it's your fault; Illumos
told you that dynamic linking to libc
is the official API.
Other Unixes simply do this tacitly and by accretion. For example,
on any Unix using nsswitch.conf
, it's very difficult to always
get the same results for operations like getaddrinfo()
without
going through the standard C library, because these may use arbitrary
and strange dynamically loaded
modules that are accessed through libc
and require various random
libc
APIs to work. This points out one of the problems here; once
you start (indirectly) calling random bits of the libc
API, they
may quite reasonably make assumptions about the runtime environment
that they're operating in. How to set up a limited standard C library
runtime environment is generally not documented; instead the official
view is generally 'let the standard C library runtime code start
your main()
function'.
I'm not at all sure that all of this requirement and entanglement with the standard C library and its implicit runtime environment is a good thing. The standard C library's runtime environment is designed for C, and it generally contains a tangled skein of assumptions about how things work. Forcing all other languages to fit themselves into these undocumented constraints is clearly confining, and the standard C library generally isn't designed to be a transparent API; in fact, at least GNU libc deliberately manipulates what it does under the hood to be more useful to C programs. Whether these manipulations are useful or desired for your non-C language is an open question, but the GNU libc people aren't necessarily going to even document them.
(Marcan's story shows that the standard C library behavior would have been a problem for any language environment that attempted to use minimal stacks while calling into 'libc', here in the form of a kernel vDSO that's designed to be called through libc. This also shows another aspect of the problem, in that as far as I know how much stack space you must provide when calling the standard C library is generally not documented. It's just assumed that you will have 'enough', whatever that is. C code will; people who are trying to roll their own coroutines and thread environment, maybe not.)
This implicit assumption has a long history in Unix. Many Unixes
have only really documented their system calls in the form of the
standard C library interface to them, quietly eliding the distinction
between the kernel API to user space and the standard C library API
to C programs. If you're lucky, you can dig up some documentation
on how to make raw system calls and what things those raw system
calls return in unusual cases like pipe(2)
.
I don't think very many Unixes have ever tried to explicitly and
fully document the kernel API separately from the standard C library
API, especially once you get into cases like ioctl()
(where there
are often C macros and #define
s that are used to form some of the
arguments, which are of course only 'documented' in the C header
files).
Understanding IMAP path prefixes in clients and servers
Suppose you have some IMAP clients and they talk to an IMAP server which stores mailboxes somewhere in the filesystem under people's home directories (let's call this the IMAP root for a user). One of the complications of talking about where people's mailboxes and folders actually wind up in this environment is that both the clients and the server get to contribute their two cents, but how they manifest is different.
(As a disclaimer, I'm probably abusing IMAP related terminology here in ways that aren't proper and that I'd fix if I actually ever read up on the details of the IMAP protocol and what it calls things.)
To start with, the IMAP protocol has the concept of a hierarchy of
folders and mailboxes, rooted at /
. This hierarchy is an abstract
thing; it's how clients name things to the server (and how they
traverse the namespace with operations like LIST
and LSUB
).
The IMAP server may implement this hierarchical namespace however
it wants, using whatever internal names for things that it wants
to (provided that it can map back and forth between internal names
and protocol level ones know by clients and named in the IMAP
subscriptions and so on). Even when an IMAP server stores this IMAP
protocol namespace in the filesystem, it may or may not use the
client names for things. For now, let's assume that our IMAP server
does.
Many IMAP clients have in their advanced configuration options an
option for something like an 'IMAP Path Prefix' or an 'IMAP server
directory', to use the names that iOS and Thunderbird respectively
use for this. This is what it sort of sounds like; it basically
causes the IMAP client to use this folder (or series of folders)
as a prefix on all of the mailbox and folder names it uses, making
it into the root of the IMAP namespace instead of /
. If you set
this in the client to IMail
and have a mailbox that you call
'Private
' in the client, the actual name of the mailbox in the
IMAP protocol is IMail/Private
. Your client simply puts the IMail
on the front when it's talking to the server and takes it back off
when it gets stuff back and presents this to you.
A client that has an IMAP path prefix and uses LIST
will normally
only ask for listings of things under its path prefix, because
that's what you told it to do. What's visible under the true IMAP
root is irrelevant to such a client; it will always confine itself
to the path prefix. In our filesystem-backed IMAP server, this means
that the client is voluntarily confining itself to a subdirectory
of wherever the IMAP server stores things in the filesystem and it
doesn't care (and won't notice) what's outside of that subdirectory.
On the server side, the IMAP server might be configured (as ours
sadly is) to store folders and mailboxes
straight under $HOME
, or it might be configured to store them
starting in a subdirectory, say $HOME/IMAP
. This mapping from the
IMAP protocol directory hierarchy used by clients to a directory
tree somewhere in the filesystem is very much like how a HTTP server
maps from URLs to filesystem locations under its document root
(although in the case of the IMAP server, there is a different 'IMAP
root' for every user). A properly implemented IMAP server doesn't
allow clients to escape outside of this IMAP root through clever
tricks like asking for '..
', although it may be willing to follow
symlinks in the filesystem that lead outside of it.
(As far as I know, such symlinks can't be created through the IMAP protocol, so they must be set up by outside means such as the user sshing in to the IMAP server machine and making a symlink by hand. Of course, with fileservers and shared home directories, that can be any of our Linux servers.)
Using an IMAP path prefix in your client is a good thing if the
server's IMAP root is, say, $HOME
, since there are probably a
great many things there that aren't actually mailboxes and mail
folders and that will only confuse your client (and complicate its
listing of actual interesting mailboxes) if it looks at them by
asking for a listing of /
, the root of the IMAP namespace. With
an IMAP path prefix configured, your client will always look at a
subdirectory of $HOME
where you'll presumably only have mailboxes
and so on.
The IMAP server is basically oblivious to the use of a client side IMAP path prefix and can't exert any control over it. The client never explicitly tells the server 'I'm using this path prefix'; all the server sees is that the client only ever does operations on things with some prefix.
The net result of this is that you can't transparently replace the
use of a client side IMAP path prefix with the equivalent server
side change in where the IMAP root is. If you start out with a
client IMAP path prefix of IMail
and a server IMAP root of $HOME
,
and then change to a server IMAP root of $HOME/IMail
, the client
will still try to access IMail/Private
, the server will translate
this to $HOME/IMail/IMail/Private
, and things will probably be
sad. To make this work, either you need to move things at the Unix
filesystem level or people have to change their IMAP clients to
take out the IMAP path prefix.
To make this perhaps a little bit clearer, here is a table of the various pieces and the resulting Unix path that gets formed once all the bits have been put together.
Server IMAP root | client IMAP prefix | Client folder | Unix path |
$HOME |
<none> | Private |
$HOME/Private |
$HOME |
<none> | IMail/Private |
$HOME/IMail/Private |
$HOME |
IMail |
Private |
$HOME/IMail/Private |
$HOME/IMail |
IMail |
Private |
$HOME/IMail/IMail/Private |
$HOME/IMail |
<none> | Private |
$HOME/IMail/Private |
For a given server IMAP root, it doesn't matter whether the client forms the (sub)folder name explicitly or through use of a client IMAP path prefix. If you use multiple clients and only some of them are set up with your IMAP path prefix, clients configured with the prefix will see folder names with the prefix stripped off and other clients will see the full (IMAP protocol) folder path; this is the second and third lines of the table.
(If all of your clients respect IMAP subscriptions, the server may not be able to tell whether or not any particular one of them has an IMAP path prefix configured, or if it's just dutifully following the subscriptions (which are of course all inside the IMAP path prefix you have configured on some clients).)
(This is one of the entries I write partly to get all of this straight in my head.)