Wandering Thoughts archives

2017-12-31

Is the C runtime and library a legitimate part of the Unix API?

One of the knocks against Go is, to quote from Debugging an evil Go runtime bug (partly via):

Go also happens to have a (rather insane, in my opinion) policy of reinventing its own standard library, so it does not use any of the standard Linux glibc code to call vDSO, but rather rolls its own calls (and syscalls too).

Ordinary non-C languages on Unixes generally implement a great many low level operations by calling into the standard C library. This starts with things like making system calls, but also includes operations such as getaddrinfo(3). Go doesn't do this; it implements as much as possible itself, going straight down to direct system calls in assembly language. Occasionally there are problems that ensue.

A few Unixes explicitly say that the standard C library is the stable API and point of interface with the system; one example is Solaris (and now Illumos). Although they don't casually change the low level system call implementation, as far as I know Illumos officially reserves the right to change all of their actual system calls around, breaking any user space code that isn't dynamically linked to libc. If your code breaks, it's your fault; Illumos told you that dynamic linking to libc is the official API.

Other Unixes simply do this tacitly and by accretion. For example, on any Unix using nsswitch.conf, it's very difficult to always get the same results for operations like getaddrinfo() without going through the standard C library, because these may use arbitrary and strange dynamically loaded modules that are accessed through libc and require various random libc APIs to work. This points out one of the problems here; once you start (indirectly) calling random bits of the libc API, they may quite reasonably make assumptions about the runtime environment that they're operating in. How to set up a limited standard C library runtime environment is generally not documented; instead the official view is generally 'let the standard C library runtime code start your main() function'.

I'm not at all sure that all of this requirement and entanglement with the standard C library and its implicit runtime environment is a good thing. The standard C library's runtime environment is designed for C, and it generally contains a tangled skein of assumptions about how things work. Forcing all other languages to fit themselves into these undocumented constraints is clearly confining, and the standard C library generally isn't designed to be a transparent API; in fact, at least GNU libc deliberately manipulates what it does under the hood to be more useful to C programs. Whether these manipulations are useful or desired for your non-C language is an open question, but the GNU libc people aren't necessarily going to even document them.

(Marcan's story shows that the standard C library behavior would have been a problem for any language environment that attempted to use minimal stacks while calling into 'libc', here in the form of a kernel vDSO that's designed to be called through libc. This also shows another aspect of the problem, in that as far as I know how much stack space you must provide when calling the standard C library is generally not documented. It's just assumed that you will have 'enough', whatever that is. C code will; people who are trying to roll their own coroutines and thread environment, maybe not.)

This implicit assumption has a long history in Unix. Many Unixes have only really documented their system calls in the form of the standard C library interface to them, quietly eliding the distinction between the kernel API to user space and the standard C library API to C programs. If you're lucky, you can dig up some documentation on how to make raw system calls and what things those raw system calls return in unusual cases like pipe(2). I don't think very many Unixes have ever tried to explicitly and fully document the kernel API separately from the standard C library API, especially once you get into cases like ioctl() (where there are often C macros and #defines that are used to form some of the arguments, which are of course only 'documented' in the C header files).

unix/UnixAPIAndCRuntime written at 17:24:55; Add Comment

Understanding IMAP path prefixes in clients and servers

Suppose you have some IMAP clients and they talk to an IMAP server which stores mailboxes somewhere in the filesystem under people's home directories (let's call this the IMAP root for a user). One of the complications of talking about where people's mailboxes and folders actually wind up in this environment is that both the clients and the server get to contribute their two cents, but how they manifest is different.

(As a disclaimer, I'm probably abusing IMAP related terminology here in ways that aren't proper and that I'd fix if I actually ever read up on the details of the IMAP protocol and what it calls things.)

To start with, the IMAP protocol has the concept of a hierarchy of folders and mailboxes, rooted at /. This hierarchy is an abstract thing; it's how clients name things to the server (and how they traverse the namespace with operations like LIST and LSUB). The IMAP server may implement this hierarchical namespace however it wants, using whatever internal names for things that it wants to (provided that it can map back and forth between internal names and protocol level ones know by clients and named in the IMAP subscriptions and so on). Even when an IMAP server stores this IMAP protocol namespace in the filesystem, it may or may not use the client names for things. For now, let's assume that our IMAP server does.

Many IMAP clients have in their advanced configuration options an option for something like an 'IMAP Path Prefix' or an 'IMAP server directory', to use the names that iOS and Thunderbird respectively use for this. This is what it sort of sounds like; it basically causes the IMAP client to use this folder (or series of folders) as a prefix on all of the mailbox and folder names it uses, making it into the root of the IMAP namespace instead of /. If you set this in the client to IMail and have a mailbox that you call 'Private' in the client, the actual name of the mailbox in the IMAP protocol is IMail/Private. Your client simply puts the IMail on the front when it's talking to the server and takes it back off when it gets stuff back and presents this to you.

A client that has an IMAP path prefix and uses LIST will normally only ask for listings of things under its path prefix, because that's what you told it to do. What's visible under the true IMAP root is irrelevant to such a client; it will always confine itself to the path prefix. In our filesystem-backed IMAP server, this means that the client is voluntarily confining itself to a subdirectory of wherever the IMAP server stores things in the filesystem and it doesn't care (and won't notice) what's outside of that subdirectory.

On the server side, the IMAP server might be configured (as ours sadly is) to store folders and mailboxes straight under $HOME, or it might be configured to store them starting in a subdirectory, say $HOME/IMAP. This mapping from the IMAP protocol directory hierarchy used by clients to a directory tree somewhere in the filesystem is very much like how a HTTP server maps from URLs to filesystem locations under its document root (although in the case of the IMAP server, there is a different 'IMAP root' for every user). A properly implemented IMAP server doesn't allow clients to escape outside of this IMAP root through clever tricks like asking for '..', although it may be willing to follow symlinks in the filesystem that lead outside of it.

(As far as I know, such symlinks can't be created through the IMAP protocol, so they must be set up by outside means such as the user sshing in to the IMAP server machine and making a symlink by hand. Of course, with fileservers and shared home directories, that can be any of our Linux servers.)

Using an IMAP path prefix in your client is a good thing if the server's IMAP root is, say, $HOME, since there are probably a great many things there that aren't actually mailboxes and mail folders and that will only confuse your client (and complicate its listing of actual interesting mailboxes) if it looks at them by asking for a listing of /, the root of the IMAP namespace. With an IMAP path prefix configured, your client will always look at a subdirectory of $HOME where you'll presumably only have mailboxes and so on.

The IMAP server is basically oblivious to the use of a client side IMAP path prefix and can't exert any control over it. The client never explicitly tells the server 'I'm using this path prefix'; all the server sees is that the client only ever does operations on things with some prefix.

The net result of this is that you can't transparently replace the use of a client side IMAP path prefix with the equivalent server side change in where the IMAP root is. If you start out with a client IMAP path prefix of IMail and a server IMAP root of $HOME, and then change to a server IMAP root of $HOME/IMail, the client will still try to access IMail/Private, the server will translate this to $HOME/IMail/IMail/Private, and things will probably be sad. To make this work, either you need to move things at the Unix filesystem level or people have to change their IMAP clients to take out the IMAP path prefix.

To make this perhaps a little bit clearer, here is a table of the various pieces and the resulting Unix path that gets formed once all the bits have been put together.

Server IMAP root client IMAP prefix Client folder Unix path
$HOME <none> Private $HOME/Private
$HOME <none> IMail/Private $HOME/IMail/Private
$HOME IMail Private $HOME/IMail/Private
$HOME/IMail IMail Private $HOME/IMail/IMail/Private
$HOME/IMail <none> Private $HOME/IMail/Private

For a given server IMAP root, it doesn't matter whether the client forms the (sub)folder name explicitly or through use of a client IMAP path prefix. If you use multiple clients and only some of them are set up with your IMAP path prefix, clients configured with the prefix will see folder names with the prefix stripped off and other clients will see the full (IMAP protocol) folder path; this is the second and third lines of the table.

(If all of your clients respect IMAP subscriptions, the server may not be able to tell whether or not any particular one of them has an IMAP path prefix configured, or if it's just dutifully following the subscriptions (which are of course all inside the IMAP path prefix you have configured on some clients).)

(This is one of the entries I write partly to get all of this straight in my head.)

sysadmin/IMAPPrefixesClientAndServer written at 01:14:58; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.