Go's sync.Pool has (undocumented) 'thread' locality

November 6, 2022

I was recently reading Andrei Pechkurov's Thread-Local State in Go, Huh? (via), which told me something surprising:

I'm talking of sync.Pool. If you're familiar with its source code, you already know that it uses thread-local pools under the hood. If we allocate a struct and place it in the pool, the next time we request it one the same thread (but not necessarily same goroutine) we should get the same struct.

First, let's say the obvious thing: this sync.Pool behavior is undocumented and so may change at any time, if the Go developers feel that it should be done a different way or just if they get annoyed at people building code around it. The second thing to say is that this doesn't mean what you want and it's not necessarily predictable, although it's more predictable than I initially thought.

Go (currently) uses an M:N work stealing scheduler to multiplex goroutines on to OS threads. The scheduler has three important sorts of entities: a G is a goroutine, an M is an OS thread (a 'machine'), and a P is a 'processor', which at its core is a limited resource that must be claimed by an M in order to run user-level Go code. What sync.Pool is (currently) doing in its local pools is 'P-local pools' (as far as I can tell).

There are always N Ps (where N is the amount of 'CPUs' Go is allowed to use). In a steady state of computation, there are more or less N active Ms, each of which claims a particular P, that are scheduling and running goroutines (Gs), and generally a G will stay with an M and thus a P. However, this can get perturbed if, for example, you're making synchronous system calls. There's also no guarantees that your OS will keep running the OS level thread (an M, holding a P) on the same actual CPU as before; it may get bumped off by other things that want the CPU and then re-scheduled on to a different idle CPU. The association between Ps and system CPUs is only a loose one, which means that you may not get as CPU cache locality from these 'local pools' in sync.Pool as you could hope for.

What the P-local pools are good at is reducing contention. Only one goroutine can be associated with a P at any one time, so that goroutine (generally) isn't contending with anything else when it adds something to a P-local part of the pool or gets an available object from it. And in fact sync.Pool has a second level system to avoid as much locking as possible (in poolqueue.go), where one P can take an item from another P's chunk of the pool if its own chunk is empty.

What this light exploration of sync.Pool has taught me is that sync.Pool has a much more sophisticated and optimized implementation than I would have expected. You could implement a version of sync.Pool with relatively simple mutexes (and maybe atomics), but the actual Go standard library goes to some effort to make it efficient in the face of significant concurrency. Perhaps this shouldn't be surprising, since sync.Pool is used in some hot spots in the result of the standard library (the sync.Pool documentation uses fmt as an example).

Written on 06 November 2022.
« Our upgrade wave of Ubuntu 18.04 machines has gone fine
An odd error I encountered with ZFS snapshots on Ubuntu 18.04 »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 6 23:44:26 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.