Wandering Thoughts archives

2013-07-29

A consistent preference in APIs for kernel and other low-level APIs

One of the things that API Design Matters made me think about is how programmers seem to have a distinct preference in interfaces to low-level APIs: almost all of the time, people very much seem to prefer interfaces that are as close as possible to the actual underlying API. The article's criticism of the .NET socket select() API is spot on, yet you'll find very similar versions of select() (with very similar issues) in any number of high level languages. Almost always they are about as close to a straightforward mapping to the underlying low-level select() API as you can get. Go goes the other way; it doesn't expose a select() at all, instead forcing you to do everything through a very different API.

As I mentioned, I think that this is a real preference on the part of many programmers instead of simply lack of imagination on the part of language implementors. Although I have no firm knowledge, I can see reasons why programmers would like it this way; the obvious one is that it's much easier to reason about what your code is really doing if you can see a straightforward mapping between what you're doing and what the low-level API is. Adding magic transformations for 'convenience' may not be considered worth the loss of transparency, especially if you're deliberately using a low-level API precisely because you want fine control over exactly what your code does.

When enough programmers start feeling this way there are obvious social forces steering implementors away from making even minor improvements to APIs. Sure, you could switch from lists to sets for arguments and return results instead of overwriting things in place, but people will give you hassle for it because your API is not a pure mapping any more. It must be tempting to decide that the probably minor usability improvements are not worth the hassle. If you're going to irritate people anyways, you might as well go for a radical change (as Go's net package does).

(This goes with the constraints shaping kernel APIs in the first place, of course.)

Sidebar: why many implementors probably don't go for radical changes

Put simply, API design is hard and doing it well takes a bunch of extra work. As a language implementor you already have enough work simply creating a standard library for your new language even if you don't change the underlying API except to fit it into your language. Plus, if you don't change the underlying API any flaws in it are not really your fault since you're just providing a direct mapping. If you do design a new API, any flaws in that API are your fault and you'll be blamed for any insufficiencies or problems (and you may get people clamouring for access to the underlying API anyways).

(Plus, you may not be enough of an expert in the problem domain to design a good API in the first place. It's not a great idea to demand that language implementors also be experts in everything that the standard library and language environment will connect to.)

On the whole I think a language implementor has to have a lot of confidence both in their understanding of a problem area and in their ability to do API design in order to implement anything other than a straightforward mapping to the underlying API. The implementors of Go stand out precisely because they do have plenty of experience designing networking APIs in addition to implementing languages.

KernelAPIPreference written at 00:49:32; Add Comment

2013-07-28

The constraints shaping kernel APIs

I was recently reading API Design Matters, which uses the .NET socket select() function as an example of a sub-par API. As part of thinking about how it wound up with that API, one of the things I wound up mulling over is the constraints that generally shape kernel APIs (as this .NET API is ultimately descended from the Unix select() API).

The first big constraint on system calls is that the actual call itself cannot allocate memory in your process under essentially any circumstances. The direct result is that all data needs to be returned in preallocated buffers that are passed (directly or indirectly) to the kernel as part of the call. The indirect result is that kernel APIs are biased very strongly towards needing buffers that have predictable, constant sizes. A kernel API that needs a highly variable-sized output buffer is very awkward to work with; generally either you over-allocate for most cases in order to have room for the worst case or you iterate the system call at least twice in order to determine and then provide a right-sized buffer.

The other big constraint is that historically, kernel implementors prefer to do as little writing to user space in general as they can get away with. From their perspective the best system call API is one that simply returns results in a register, then one that puts a result or two in a single memory location or two, and the worst is one that requires them to splat various things all over your user memory space. It's not hard to see why this is; unlike in a library, writing things to user space requires distinct and separate work for everything the code writes (even if this is usually wrapped up in function calls and macros). This has driven kernel APIs to return the minimal information required and leave it up to user space to work out everything from there (either in a library or in your code).

(Another constraint worth mentioning is that the general system call API often makes it much easier to do calls with a small number of arguments than calls with lots of arguments. Small numbers of arguments can often be passed directly in registers, while lots of arguments can require quite involved conventions and extra work for the kernel to dig out of user space.)

We can see these constraints at work in the select() API. Specifically, select() overwrites its inputs because that's clearly the simplest place to put the output data, never mind that this is massively inconvenient for the common case of repeated select() calls on the same set of file descriptors. Anything else would require extra buffers and extra arguments to the system call. If the system call returned convenient extra information (such as how many of each sort of file descriptor were active), that would require extra writes to user space (and probably extra arguments).

Sidebar: argument counts and structures

One way to reduce the argument count to system calls is to pass some form of structure that aggregates things together instead of separate arguments. There are at least three strikes against this:

  • on an abstract level you haven't actually reduced the argument count, you've just hidden some of the arguments behind indirection.

  • kernel implementors traditionally prefer to do as little chasing of user space pointers as they can get away with. Every time you have to dereference a user space pointer is more hassle (and more things to carefully check for).

  • using structures (especially C structs) has historically been a land mine over the long term. The simpler the arguments are the easier it is to deal with things like 32 bit to 64 bit transitions, and you totally avoid compiler alignment and structure padding issues and so on.
KernelAPIConstraints written at 00:24:22; Add Comment

2013-07-15

Git's petty little irritation for me

Perhaps the most common thing for me to do with people's source code is to add my own purely local changes. When the source code is in a source repo, the simplest way of doing this is to just pull a copy of the source repo and then make my own changes on top. Then sooner or later I want to update my local repo by pulling in the latest central changes.

With CVS, SVN, and even Mercurial this more or less just works, however theoretically unclean and evil it is; all three detect that I have uncommitted local changes and attempt to re-apply them on top of the updates. Usually this succeeds and if it doesn't, the convention is to stick glaring markers in the files and leave it to me to fix them up. Git insists on doing the right thing in that at least by default it utterly refuses to try to merge my uncommitted changes with the remote updates it's just fetched.

It turns out that this unusual strictness on git's part is irritatingly inconvenient for me. I definitely don't want to actually commit my changes because that would contaminate what is otherwise a pure upstream repo history with a tangle of back and forth merges (even if I did it on a branch). There is 'git stash', but it has two aspects that I don't like; I have to remember (or be reminded) to do it explicitly and it makes changes to the repo itself (not just the checked-out files). I really like my repos to be exact, unchanged duplicates of the upstream.

(Yes, the changes 'git stash' makes should be harmless and should get fixed up when I do 'git stash clear' or the like. That's two 'shoulds'.)

This is a completely petty irritation (especially given 'git stash') and I fully acknowledge that. But I've never claimed to be entirely rational about this stuff and yes, it irritates me.

Sidebar: When this doesn't work in Mercurial

There is one circumstance when Mercurial will not do this sort of update at all, namely when the main branch in the repo shifts. The Mercurial repo for official Firefox releases hops branches this way when a new major release comes out and as a result I get to save a patch of my changes, overwrite them all by forcing a clean checkout, and then reapply them. Fortunately this is rare.

(Also these days I've found up switching to building my local Firefox from the main development repository, which doesn't branch this way.)

GitAndLocalChanges written at 00:43:30; Add Comment

2013-07-10

Knowing when to go your own way with open source programs

Dmenu has become one of the core parts of my custom environment. The other day I finally got around to adding a feature to it that I've been wanting for a while, one that opens the door for making my environment subtly nicer. Given that dmenu is an open source program, the responsible thing to do would be to go the extra distance to update the manpage and so on then submit the change upstream.

I'm not going to do that, but not for the reason that you might expect.

A while back I made some other changes to dmenu and sent them to the upstream mailing list, where they created not so much as a ripple. I'd scanned the mailing list for a bit by then and so this lack of reaction didn't particularly surprise or annoy me; instead it cemented my quiet opinion that the dmenu developers and I had different interests and visions of dmenu. My changes got no reaction because the developers found them neither offensive nor interesting.

This sort of difference in views of a program is completely routine. When it happens there is no real point in trying to feed your changes upstream, because they aren't wanted (at the best they're simply uninteresting; at the worst they actively go against the developers' vision for the program). Trying to push your changes upstream anyways is a waste of time for all concerned and risks irritating the developers and ruining any good relations you might have with them. Instead the thing to do is to consciously go your own way with the code. Accept that your changes will never be merged upstream and do whatever you want to (including things that you want but that are violently against how the upstream developers like things).

This is why I'm not going to be trying to send my latest dmenu changes upstream; I don't think they're any more in line with the developers' vision for dmenu than my original changes were.

If you modify open source programs, developing a sense of when this is and isn't the case is going to be quite useful (if only to keep you out of flamewars on project mailing lists). I don't have any particularly strong suggestions except that you really need to know the project's culture before you start sending them email. Lurking on their mailing lists or what have you is highly recommended.

(Generally I think that the more hackish and brutal my modifications to the code are the more likely I am to be doing something that's against the developers' vision for the program.)

PS: it's my stereotype that the larger a project is, the less interested it's going to be in my modification. Firefox or the Linux kernel? I can forget it. A small one-person program? Highly likely to at least be willing to talk to me, although they may well still say no.

GoingMyOwnWay written at 01:43:39; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.