2023-05-20
Some notes on the cost of Go finalizers (in Go 1.20)
I recently read Daniel Lemire's The absurd cost of finalizers in
Go
(via),
which reports on a remarkably high cost of using a finalizer to
insure that C memory is freed. Lemire's numbers aren't atypical;
in my own testing in a different environment I found a rough factor
of ten difference between directly calling C malloc()
and free()
and using a finalizer to call free()
.
The first reason for this increased overhead in Lemire's test case is perhaps somewhat surprising, which is that using a finalizer forces heap allocation, while Lemire's non-finalizer version does not. Suppose that you have:
func Allocate() *C.char { return C.allocate() } func Free(c *C.char) { C.free_allocated(c) } // in a _test.go file func BenchmarkAllocate(b *testing.B) { for j := 0; j < b.N; j++ { p := Allocate() Free(p) } }
Go 1.20 is smart enough to allocate 'p
' on the Go stack, so while
the C code is calling malloc()
and free()
, Go is not doing
anything with its own memory system. The moment you call
runtime.SetFinalizer()
this changes; Go considers the object you're trying to finalize
to escape, so it allocates it in the heap. Probably this often
won't matter in real situations, because what you're finalizing
is already going to be heap allocated.
(In Lemire's test code, you can see this if you use 'go test
-benchmem -bench=Benchmark -run -
'; some of the benchmarks
will allocate nothing per invocation, and others will allocate
one thing.)
Lemire tested with garbage collection (GC) turned off in the Go
runtime and got similar results, so theorized that SetFinalizer()
was the expensive portion. I constructed a synthetic test function
that only set a finalizer without making any cgo calls, and this
does seem to be the case. With Go's GC on in its normal state, over
50% of the runtime of benchmarking this function is in SetFinalizer(),
mostly in an internal runtime function called runtime.addspecial()
.
There are some other surprises, though. In total, GC activity seems
to be about 27% of the runtime, with about half of that being
directly triggered by allocations and half happening in the background.
Much of the GC time seems to be spent processing and running
finalizers, even though the test's finalizer does nothing (21% of
the total time). A surprisingly high percentage of the time is spent
locking and unlocking things, with the Go profiler attributing 10%
to 'runtime.lock2()
' and 10% to 'runtime.unlock2()
'.
What I take from this is that SetFinalizer() is probably not
considered something that you should use heavily, and as a result
it hasn't been heavily optimized. You can get a sense of this from
the extensive documentation around its limitations and issues in
the runtime.SetFinalizer()
documentation; using it correctly is
tricky, and correctly using anything with a finalizer attached is
also tricky (see the discussion of the example with file descriptors).
PS: One of the effects of putting finalizers on objects is that the objects will take longer to be garbage collected (an unused object with a finalizer takes two GC cycles to collect, instead of one). This may affect how you structure objects and where you attach finalizers; you probably don't want to put a finalizer on a big object or on an object that will be directly embedded in one (since Go doesn't free sub-objects by themselves).
Sidebar: My finalizer-only test code
In case people want to run their own tests:
// Used by Lemire's other benchmarks type Cstr struct { cpointer *C.char } // No C malloc, no finalizer code func EmptyFinalizer() *Cstr { answer := &Cstr{} runtime.SetFinalizer(answer, func(c *Cstr) {}) return answer } // in _test file func BenchmarkEmptyFinalizer(b *testing.B) { for j := 0; j < b.N; j++ { EmptyFinalizer() } }
I deliberately structured this to be as close to Lemire's other benchmark test functions as possible, hence its use of the Cstr type.