Operating system threads are always going to be (more) expensive

September 7, 2024

Recently I read Asynchronous IO: the next billion-dollar mistake? (via). Among other things, it asks:

Now imagine a parallel universe where instead of focusing on making asynchronous IO work, we focused on improving the performance of OS threads [...]

I don't think this would have worked as well as you'd like, at least not with any conventional operating system. One of the core problems with making operating system threads really fast is the 'operating system' part.

A characteristic of all mainstream operating systems is that the operating system kernel operates in a separate hardware security domain than regular user (program) code. This means that any time the operating system becomes involved, the CPU must do at least two transitions between these security domains (into kernel mode and then back out). Doing these transitions is always more costly than not doing them, and on top of that the CPU's ISA often requires the operating system go through non-trivial work in order to be safe from user level attacks.

(The whole speculative execution set of attacks has only made this worse.)

A great deal of the low level work of modern asynchronous IO is about not crossing between these security domains, or doing so as little as possible. This is summarized as 'reducing system calls because they're expensive', which is true as far as it goes, but even the cheapest system call possible still has to cross between the domains (if it is an actual system call; some operating systems have 'system calls' that manage to execute entirely in user space).

The less that doing things with threads crosses the CPU's security boundary into (and out of) the kernel, the faster the threads go but the less we can really describe them as 'OS threads' and the harder it is to get things like forced thread preemption. And this applies not just for the 'OS threads' themselves but also to their activities. If you want 'OS threads' that perform 'synchronous IO through simple system calls', those IO operations are also transitioning into and out of the kernel. If you work to get around this purely through software, I suspect that what you wind up with is something that looks a lot like 'green' (user-space) threads with asynchronous IO once you peer behind the scenes of the abstractions that programs are seeing.

(You can do this today, as Go's runtime demonstrates. And you still benefit significantly from the operating system's high efficiency asynchronous IO, even if you're opting to use a simpler programming model.)

(See also thinking about event loops versus threads.)

Written on 07 September 2024.
« The problems (Open)ZFS can have on new Linux kernel versions
I wish (Linux) WireGuard had a simple way to restrict peer public IPs »

Page tools: View Source.
Search:
Login: Password:

Last modified: Sat Sep 7 00:01:53 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.