Even in Go, concurrency is still not easy (with an example)
Go is famous for making concurrency easy, through good language support for goroutines. Except what Go makes easy is only one level of concurrency, the nuts and bolts level of making your code do things concurrently and communicating back and forth through channels. Making it do the right things concurrently is still up to you, and unfortunately Go doesn't currently provide a lot of standard library support for correctly implemented standard concurrency patterns.
For example, one common need is for a limited amount of concurrency;
you want to do several things at once, but only so many of them.
At the moment this is up to you to implement on top of goroutines,
channels, and things like the sync
package. This is not as easy as it looks, and quite competent people
can make mistakes here. As it happens, I have an example ready to
hand today.
Gops is a convenient command to
list (and diagnose) Go processes that are currently running on your
system. Among other things, it'll tell you which version of Go they
were compiled with, which is handy if you want to see if you have out
of date binaries that should be rebuilt and redeployed. One of the
things gops
needs to do is look at all of the Go processes on your
system, which it does concurrently. However, it doesn't want to look
at too many processes at once, because that can cause problems with
file descriptor limits. This
is a classic case of limited concurrency.
Gops implements this at the moment with code in goprocess.FindAll() that looks like this, in somewhat sketched and reduced form:
func FindAll() []P { pss, err := ps.Processes() [...] found := make(chan P) limitCh := make(chan struct{}, concurrencyProcesses) for _, pr := range pss { limitCh <- struct{}{} pr := pr go func() { defer func() { <-limitCh }() [... get a P with some error checking ...] found <- P }() } [...] var results []P for p := range found { results = append(results, p) } return results }
(In the real code there's a WaitGroup for coordination, and the
found
channel gets closed appropriately.)
How this works is clear, and is a standard pattern (covered in eg Go 101's Channel Use Cases). We use a buffered channel to provide a limited number of tokens; sending a value into the channel implicitly takes a token (and blocks if the token supply is exhausted), while receiving a value from it puts a token back in. We take a token before we start a new goroutine, and the goroutine releases the token when it's done.
Except that this code has a bug if there are too many processes to examine. Even knowing that there is a bug in this code, it may not be obvious.
The bug is that the goroutines only receive from limitCh
to release
their token after sending their result to the unbuffered found
channel, while the main code only starts receiving from found
after running through the entire loop, and the main code takes
the token in the loop and blocks if no tokens are available. So
if you have too many processes to go through, you start N goroutines,
they all block trying to write to found
and don't receive from
limitCh
, and the main for
loop blocks trying to send to limitCh
and never reaches the point where it starts receiving from found
.
At one level, this bug is a very fragile bug; it only exists because
of multiple circumstances. If the goroutines took the token by
sending to limitCh
instead of the main for
loop doing it, the
bug would not exist; the main for
loop would start them all, many
would stop, and then it would go on to receive from found
so that
they could receive from limitCh
and release their token so other
goroutines would run. If the goroutines received from limitCh
to
release their token before sending to found
, it wouldn't exist
(but because of error handling, it's simpler and more reliable to
do the receive in a defer
). And if the entire for
loop was in
an additional goroutine, the main code would go on to receive from
found
and unblock completed goroutines to release their tokens,
so the fact that the for
loop was blocked waiting to send to
limitCh
wouldn't matter.
At another level, this shows how concurrency is not easy as easy as it looks in Go. All you need is one mistake and things skid to a halt, and all of the code involved can look good to a casual examination. Getting concurrency correct is simply hard for people (we can debate about why, but I think that it is is very clear).
(I'm sure that the people who wrote and approved the change that
added this concurrency limiting code to gops were good programmers.
A tricky case still tripped them up, passing all of their scrutiny.
Even when I knew that there was a concurrency problem in the code
and where it was (because my gops
was hanging all of a sudden,
and Delve told me where
everything was stuck), it still took me some time to see what the
exact problem was.)
|
|