Where (and how) you limit your concurrency in Go can matter
At the start of September, I wrote about how concurrency is still not easy even in Go, using a section of real code with a deadlock as the example. In that entry, I proposed three fixes to remove the deadlock. Since Hillel Wayne's Finding Goroutine Bugs with TLA+ has now formally demonstrated that all three of my proposed fixes work, I can talk about the practical differences between them.
For convenience, here's the original code from the first entry:
func FindAll() []P { pss, err := ps.Processes() [...] found := make(chan P) limitCh := make(chan struct{}, concurrencyProcesses) for _, pr := range pss { // deadlocks here: limitCh <- struct{}{} pr := pr go func() { defer func() { <-limitCh }() [... get a P with some error checking ...] // and deadlocks here: found <- P }() } [...] var results []P for p := range found { results = append(results, p) } return results }
The buffered limitCh
channel is used to implement a limited supply
of tokens, to hold down the number of goroutines that are getting
P's at once. The bug in this code is that the goroutines only receive
from limitCh
to release their token after sending their result
to the unbuffered found
channel, while the main code only starts
receiving from found
after running through the entire for
loop,
and the main code takes the token in the loop and blocks if no
tokens are available. (For more, see the original entry.)
There are at least three fixes possible: the goroutines can send
to limitCh
instead of the main function doing it, the goroutines
can receive from limitCh
before sending to found
, or the entire
for
loop can be in an additional goroutine so that it doesn't
block the main function from starting to receive from found
. All
three of these fixes work, as proven by Hillel Wayne, but they have
different effects on the number of goroutines that this code will
run if pss
is large and what the state of those goroutines is.
If our goal is to minimize resource usage, the worst fix is for
goroutines to receive from limitCh
before sending to found
.
This fix will cause almost all goroutines to stall in the send to
found
, because all but a few of them must be started and run
almost to completion before the main code can finish the for
loop
and start receiving from found
to unblock all of those sends and
let the goroutines exit. These waiting to send goroutines are keeping
used their fully expanded goroutine stacks, and possibly other
resources that have not yet been released by them exiting and things
becoming unused so the garbage collector can collect them (or by
additional defer
statements releasing things).
The middling fix is for goroutines to receive from limitCh
instead
of the for
loop doing it. We will probably immediately create and
start almost all of the full pss
worth of goroutines, which could
be bad if pss
is very large, but at least they all block immediately
with almost no resources used and with very small goroutine stacks.
Still, this is a bunch of memory and a bunch of (Go) scheduler churn
to start all of those goroutines only to have most of them immediately
block receiving from limitCh
. There's also going to be a lot of
contention on internal runtime locks associated with limitCh
,
since a lot of goroutines are queueing up on it.
The best fix for resource usage is to push the for
loop into its
own goroutine but to otherwise keep things the same. Because the
for
loop is still receiving from limitCh
before it creates a
new goroutine, the number of simultaneous goroutines we ever have
will generally be limited to around our desired concurrency level
(there will be some extra that have received from limitCh
but not
yet finished completely exiting).
It's likely that none of this matters if the for
loop only has to
deal with a few hundred entries, and that's probably the case for
this code (at least most of the time). But it makes for a useful
illustration. When you're writing code with enforced limited concurrency
it's probably worthwhile to think about where you want to limit the
concurrency and what effects that has on things overall. As we can see
here, small implementation choices can have potentially large impacts.
(Also, sometimes I think too much about this sort of thing.)
|
|