Goroutines versus other concurrency handling options in Go

July 6, 2014

Go makes using goroutines and channels very attractive; they're consciously put forward as the language's primary way of doing concurrency and thus the default solution to any concurrency related issue you may have. However I'm not sure that they're the right approach for everything I've run into, although I'm still mulling over what the balance is.

The sort of problem that channels and goroutines don't seem an entirely smooth fit for is querying shared state (or otherwise getting something from it). Suppose that you're keeping track of the set of SMTP client IPs that have tried to start TLS with you but have failed; if a client has failed TLS setup, you don't want to offer it TLS again (or at least not within a given time). Most of the channel-based solution is straightforward; you have a master goroutine that maintains the set of IPs privately and you add IPs to it by sending a message down the channel to the master. But how do you ask the master goroutine if an IP is in the set? The problem is that you can't get a reply from the master on a common shared channel because there is no way for the master to reply specifically to you.

The channel based solution for this that I've seen is to send a reply channel as part of your query to the master (which is sent over a shared query channel). The downside of this approach is the churn in channels; every request allocates, initializes, uses once, and then destroys a channel (and I think they have to be garbage collected, instead of being stack allocated and quietly cleaned up). The other option is to have a shared data structure that is explicitly protected by locks or other facilities from the sync package. This is more low level and requires more bookkeeping but you avoid bouncing channels around.

But efficiency is probably not the right concern for most Go programs I'll ever write. The real question is which is easier to write and results in clearer code. I don't have a full conclusion but I do have a tentative one, and it's not entirely the one I expected: locks are easier if I'm dealing with more than one sort of query against the same shared state.

The problem with the channel approach in the face of multiple sorts of queries is that it requires a lot of what I'll call type bureaucracy. Because channels are typed, each different sort of reply needs a type (explicit or implicit) to define what is sent down the reply channel. Then basically each different query also needs its own type, because queries must contain their (typed) reply channel. A lock based implementation doesn't make these types disappear but it makes them less of a pain because they are just function arguments and return values and thus they don't have to be formally defined out as Go types and/or structs. In practice this winds up feeling more lightweight to me, even with the need to do explicit manual locking.

(You can reduce the number of types needed in the channel case by merging them together in various ways but then you start losing type safety, especially compile time type safety. I like compile time type safety in Go because it's a reliable way of telling me if I got something obvious wrong and it helps speed up refactoring.)

In a way I think that channels and goroutines can be a form of Turing tarpit, in that they can be used to solve all of your problems if you're sufficiently clever and it's very tempting to work out how to be that clever.

(On the other hand sometimes channels are a brilliant solution to a problem that might look like it had nothing to do with them. Before I saw that presentation I would never have thought of using goroutines and channels in a lexer.)

Sidebar: the Go locking pattern I've adopted

This isn't original to me; I believe I got it from the Go blog entry on Go maps in action. Presented in illustrated form:

// actual entries in our shared data structure
type ipEnt struct {
  when  time.time
  count int
}

// the shared data structure and the lock
// protecting it, all wrapped up in one thing.
type ipMap struct {
  sync.RWMutex
  ips map[string]*ipEnt
}

var notls = &ipMap{ips: make(map[string]*ipEnt)}

// only method functions manipulate the shared
// data structure and they always take and release
// the lock. outside callers are oblivious to the
// actual implementation.
func (i *ipMap) Add(ip string) {
  i.Lock()
  ... manipulate i.ips ...
  i.Unlock()
}

Using method functions feels the most natural way to manipulate the data structure, partly because how you manipulate it is very tightly bound to what it is due to locking requirements. And I just plain like the syntax for doing things with it:

if res == TLSERROR {
  notls.Add(remoteip)
  ....
}

The last bit is a personal thing, of course. Some people will prefer standalone functions that are passed the ipMap as an explicit argument.

Written on 06 July 2014.
« The problem with filenames in IO exceptions and errors
Some thoughts on SAN long-term storage migration »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Jul 6 22:51:07 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.