Wandering Thoughts archives


Why exposing only blocking APIs are ultimately a bad idea

I recently read Marek's Socket API thoughts, which mulls over a number of issues and ends with the remark:

But nonetheless, I very much like the idea of only blocking API's being exposed to the user.

This is definitely an attractive idea. All of the various attempts at select() style APIs have generally not gone well, high level callbacks give you 'callback hell', and it would be conceptually nice to combine cheap concurrency with purely blocking APIs to have our cake and eat it too. It's no wonder this idea comes up repeatedly and I feel the tug of it myself.

Unfortunately, I've wound up feeling that it's fundamentally a mistake. While superficially attractive, attempting to do this in the real world is going to wind up with an increasingly ugly mess in practice. For the moment let's set aside the issue that cheap concurrency is fundamentally an illusion and assume that we can make the illusion work well enough here. This still leaves us with the select() problem: sooner or later the result of one IO will make you want to stop doing another waiting IO. Or more generally, sooner or later you'll want to stop doing some bit of blocking IO as the result of other events and processing inside your program.

When all IO is blocking, separate IO must be handled by separate threads and thus you need to support external (cross-thread) cancellation of in-flight blocked IO out from underneath a thread. The moment you have this sort of unsynchronized and forced cross-thread interaction, you have a whole collection of thorny concurrency issues that we have historically not been very good at dealing with. It's basically guaranteed that people will write IO handling code with subtle race conditions and unhandled (or mishandled) error conditions, because (as usual) they didn't realize that something was possible or that their code could be trying to do thing X right as thing Y was happening.

(I'm sure that there are API design mistakes that can and will be made here, too, just as there have been a series of API design mistakes around select() and its successors. Even APIs are hard to get completely right in the face of concurrency issues.)

There is no fix for this that I can see for purely blocking APIs. Either you allow external cancellation of blocked IO, which creates the cross-thread problems, or you disallow it and significantly limit your IO model, creating real complications as well as limiting what kind of systems your APIs can support.

(For the people who are about to say 'but Go makes it work', I'm afraid that Go doesn't. It chooses to limit what sort of systems you can build, and I'm not just talking about the memory issues.)

PS: I think it's possible to sort of square the circle here, but the solution must be deeply embedded into the language and its runtime. The basic idea is to create a CSP like environment where waiting for IO to complete is a channel receive or send operation, and may be mixed with other channel operations in a select. Once you have this, you have a relatively clean way to cancel a blocked IO; the thread performing the IO simply uses a multi-select, where one channel is the IO operation and another is the 'abort the operation' channel. This doesn't guarantee that everyone will get it right, but it does at least reduce your problem down to the existing problem of properly handling channel operation ordering and so on. But this is not really a 'only blocking API' as we normally think of it and, as mentioned, it requires very deep support in the language and runtime (since under the hood this has to actually be asynchronous IO and possibly involve multiple threads).

This is also going to sometimes be somewhat of a lie, because on many systems there is a certain amount of IO that is genuinely synchronous and can't be interrupted at all, despite you putting it in a multi-channel select statement. Many Unixes don't really support asynchronous reads and writes from files on disk, for example.

programming/PureBlockingAPIsWhyBad written at 23:53:52; Add Comment

Some notes on ZFS per-user quotas and their interactions with NFS

In addition to quotas on filesystems themselves (refquota) and quotas on entire trees (plain quota), ZFS also supports per-filesystem quotas on how much space users (or groups) can use. We haven't previously used these for various reasons, but today we had a situation with an inaccessible runaway user process eating up all the free space in one pool on our fileservers and we decided to (try to) stop it by sticking a quota on the user. The result was reasonably educational and led to some additional educational experimentation, so now it's time for notes.

User quotas for a user on a filesystem are created by setting the userquota@<user> property of the filesystem to some appropriate value. Unlike overall filesystem and tree quotas, you can set a user quota that is below the user's current space usage. To see the user's current space usage, you look at userused@<user> (which will have its disk space number rounded unless you use 'zfs get -p userused@<user> ...'). To clear the user's quota limit after you don't need it any more, set it to none instead of a size.

(The current Illumos zfs manpage has an annoying mistake, where its section on the userquota@<user> property talks about finding out space by looking at the 'userspace@<user>' property, which is the wrong property name. I suppose I should file a bug report.)

Since user quotas are per-filesystem only (as mentioned), you need to know which filesystem or filesystems your errant user is using space on in your pool in order to block a runaway space consumer. In our case we already have some tools for this and had localized the space growth to a single filesystem; otherwise, you may want to write a script in advance so you can freeze someone's space usage at its current level on a collection of filesystems.

(The mechanics are pretty simple; you set the userquota@<user> value to the value of the userspace@<user> property, if it exists. I'd use the precise value unless you're sure no user will ever use enough space on a filesystem to make the rounding errors significant.)

Then we have the issue of how firmly and how fast quotas are enforced. The zfs manpage warns you explicitly:

Enforcement of user quotas may be delayed by several seconds. This delay means that a user might exceed their quota before the system notices that they are over quota and begins to refuse additional writes with the EDQUOT error message.

This is especially the case over NFS (at least NFS v3), where NFS clients may not start flushing writes to the NFS server for some time. In my testing, I saw the NFS client's kernel happily accept a couple of GB of writes before it started forcing them out to the fileserver.

The behavior of an OmniOS NFS server here is somewhat variable. On the one hand, we saw space usage for our quota'd user keep increasing over the quota for a certain amount of time after we applied the quota (unfortunately I was too busy to time it or carefully track it). On the other hand, in testing, if I started to write to an existing but empty file (on the NFS client) once I was over quota, the NFS server refused all writes and didn't put any data in the file. My conclusion is that at least for NFS servers, the user may be able to go over your quota limit by a few hundred megabytes under the right circumstances. However, once ZFS knows that you're over the quota limit a lot of things shut down immediately; you can't make new files, for example (and NFS clients helpfully get an immediate error about this).

(I took a quick look at the kernel code but I couldn't spot where ZFS updates the space usage information in order to see what sort of lag there is in the process.)

I haven't tested what happens to fileserver performance if a NFS client keeps trying to write data after it has hit the quota limit and has started getting EDQUOTA errors. You'd think that the fileserver should be unaffected, but we've seen issues when pools hit overall quota size limits.

(It's not clear if this came up today when the user hit the quota limit and whatever process(es) they were running started to get those EDQUOTA errors.)

solaris/ZFSUserQuotaNotes written at 01:01:22; Add Comment

Page tools: See As Normal.
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.