Threads, asynchronous IO, and cancellation

September 13, 2024

Recently I read Asynchronous IO: the next billion-dollar mistake? (via), and had a reaction to one bit of it. Then yesterday on the Fediverse I said something about IO in Go:

I really wish you could (easily) cancel io Reads (and Writes) in Go. I don't think there's any particularly straightforward way to do it today, since the io package was designed way before contexts were a thing.

(The underlying runtime infrastructure can often actually do this because it decouples 'check for IO being possible' from 'perform the IO', but stuff related to this is not actually exposed.)

Today this sparked a belated realization in my mind, which is that a model of threads performing blocking IO in each thread is simply a harder environment to have some sort of cancellation in than an asynchronous or 'event loop' environment. The core problem is that in their natural state, threads are opaque and therefor difficult to interrupt or stop safely (which is part of why Go's goroutines can't be terminated from the outside). This is the natural inverse of how threads handle state for you.

(This is made worse if the thread is blocked in the operating system itself, for example in a 'read()' system call, because now you have to use operating system facilities to either interrupt the system call so the thread can return to user level to notice your user level cancellation, or terminate the thread outright.)

Asynchronous IO generally lets you do better in a relatively clean way. Depending on the operating system facilities you're using, either there is a distinction between the OS telling you that IO is possible and your program doing IO, providing you a chance to not actually do the IO, or in an 'IO submission' environment you generally can tell the OS to cancel a submitted but not yet completed IO request. The latter is racy, but in many situations the IO is unlikely to become possible right as you want to cancel it. Both of these let you implement a relatively clean model of cancelling a conceptual IO operation, especially if you're doing the cancellation as the result of another IO operation.

Or to put it another way, event loops may make you manage state explicitly, but that also means that that state is visible and can be manipulated in relatively natural ways. The implicit state held in threads is easy to write code with but hard to reason about and work with from the outside.

Sidebar: My particular Go case

I have a Go program that at its core involves two goroutines, one reading from standard input and writing to a network connection, one reading from the network connection and writing to standard output. Under some circumstances, the goroutine reading from the network will want to close down the network collection and return to a top level, where another two way connection will be made. In the process, it needs to stop the 'read from stdin, write to the network' goroutine while it is parked in 'read from stdin', without closing stdin (because that will be reused for the next connection).

To deal with this cleanly, I think I would have to split the 'read from standard input, write to the network' goroutine into two that communicated through a channel. Then the 'write to the network' side could be replaced separately from the 'read from stdin' side, allowing me to cleanly substitute a new network connection.

(I could also use global variables to achieve the same substitution, but let's not.)


Comments on this page:

By reg at 2024-09-14 14:28:49:
threads are […] difficult to interrupt or stop safely

Yes, but in a pure I/O thread it's not terrible. In POSIX, the trick would be to have the thread disable cancellation generally, and enable it (in deferred mode) only around the I/O call. It might require a cleanup handler, and the API for that is not great; but with thread-local variables for state, and an enumerated set of cancellation points (preferably just one), it's manageable.

I understand that Go does not use the system C libraries. That might complicate things, or might make them easier. I believe POSIX cancellation is generally implemented via a signal sent to a thread rather than a process, whose handler does the equivalent of a pthread_exit(). I don't know what Windows does.

By Andrew at 2024-09-14 15:25:01:

For net.Conn there's a (somewhat non-obvious) way to cancel from outside: use SetDeadline (or SetReadDeadline or SetWriteDeadline if you only want to affect one kind of operation) to give the conn a deadline in the past. That tells the netpoller (which is the big event-loop that goroutines defer to when they block on network I/O) to unblock the I/O and deliver a timeout error.

But there's no similar method on os.File. If I had to guess why, I'd say it's probably because only some files are pollable (e.g. Linux won't deliver epoll events for disk files) and Go doesn't want to expose the inconsistency to the user. Even though it would be a really nice thing to have when your "file" is actually a FIFO or a tty or something.

By Joker_vD at 2024-09-18 14:49:56:

I believe that you can also use a channel to send a new dst to your tonet() gorouine. After the blocking src.Read() in the fromto() returns, you do

DstUpdateLoop:
    for {
        select {
        case dst = <-dstChan: break
        default: break DstUpdateLoop
        }
    }

to update the dst which the upper-level coordinating function would somehow arrange to send you. At least, that is something that sometimes worked in my scenarios even though I would be the first to admit that it's quite a bizarre idiom.

By cks at 2024-09-18 20:25:59:

Belatedly: it's possible to arrange safe cancellation of your threads under specific circumstances and setups. But there is no general way to do this, so code has to either opt in to it or be specifically built for it. I feel that this is broadly different from asynchronous IO, where you can have a clean general model of canceling outstanding IO.

(At the same time, the code still has to be designed for IO to be cancelled, probably as distinct from the IO failing. If you ignore the possibility of cancellation or IO failure, you can wind up writing code where things go wrong (hangs, incorrect results, etc) when this happens.)

By mst at 2024-10-09 11:27:33:

dup stdin, goroutine reads from the dup'ed fd, close that to tell it to stop?

Written on 13 September 2024.
« What admin access researchers have to their machines here
Getting maximum 10G Ethernet bandwidth still seems tricky »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Fri Sep 13 22:23:59 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.