2011-10-06
Thinking about event loops versus threads
The inimitable Ted Dziuba recently compared threads against event loops for throughput and ease of use and came down firmly on the side of threads; to summarize his writing, he argues (with math) that threads have the same throughput and are easier to use. While I sort of agree with this, I think that his math makes some assumptions or, alternately, ignores some constant factors that are potentially important in real life.
(His math implicitly assumes that servicing activity either takes the same amount of CPU in threads as it does with event loops, or that the difference in CPU required is small enough to ignore.)
If you look at them from the right angle, both threads and event loops are switching contexts back and forth between various IO requests; the difference is that threads perform this context switching implicitly for you while in an event loop you perform it explicitly. This gives event loops three potential performance advantages over threads for CPU usage: lower context switch overheads, lower costs of coordination, and aggregation of IO wait requests.
Event loops change context entirely in userspace and generally through very simple code (part of the simplicity is that event loops normally give up on scheduling), while threads have more involved and expensive context switches. However, looking only at the 'context switch' part is to overstate the event loop advantage; to be fair we need to include the work to manage all of the various callback contexts that event loops breed. My feeling is that most event loop implementations will still come out ahead, especially once multiple threads begin contending for CPU and force the scheduler to become involved.
(In many cases any work the scheduler does to 'fairly' or efficiently decide the next thread to run is almost entirely wasted effort; it just doesn't matter, and where it matters a bit the important thing is to avoid starvation.)
The coordination costs are the locking and mutexing costs that threads pay to talk to each other to exchange information; event loops mostly get this coordination for free. However this is task dependent, since how much coordination you need depends on what you're doing. Some sorts of systems require very high coordination (eg ones where information frequently passes from connection to connection), while others have none (the classic 'serve static resources' case). High coordination systems pay a high price for being threaded since hot locks and frequent locking are both very expensive.
(The counter argument is that a multi-core event loop system has the same coordination needs, although it may be able to get away with less locking if it is sufficiently clever and can partition the work between cores sufficiently carefully. You only get coordination for free if you only have one event loop process or at the least you don't need to coordinate across event loop process boundaries.)
Aggregation of IO wait requests is the question of whether it is more (CPU) efficient in your operating system to have one thread wait for activity on five hundred IO sources or to have five hundred threads wait for one IO source each. A certain amount depends on the details of the system calls available; for example, a potential factor in threads' favour is that many of their IO waits are implicit in other system calls that they are already doing. If an event loop has to do 'wait for read data to be available; read available data' (as two syscalls), threads have a potential advantage (since they're just doing a single blocking read). Things like asynchronous IO with completion notifications level the playing field again back to the basic question.
The classical thread performance breakdown under high concurrency is in fact exactly a failure of these assumptions and factors: you have so many (active) threads that the system is spending almost all of its time thrashing back and forth between them and almost none of it actually having the thread code do useful work. A corresponding pragmatic advantage of event loops is that you can build your management data structures for your expected load, using sophisticated high capacity ones where necessary.
(Operating systems have historically not designed their schedulers and other bits for huge numbers of active threads and have thus opted for relatively simple and maintainable data structures that work fine for typical, sane loads.)
One final note: the effects of all of these factors drops as the amount of real computation required to service a particular request rises and vice versa. This neatly explains where event loops have been hugely popular, namely environments where the per-request computation is effectively nil and there is a large number of requests; this situation maximizes any thread overhead that exists.
(The worst case worst case has lots of requests, very low per request computation, and high coordination requirements, thereby maximizing every possible thread disadvantage. I believe the classic demo case is simple chat rooms with large numbers of clients in each room.)