Some notes on the cost of Go finalizers (in Go 1.20)

May 20, 2023

I recently read Daniel Lemire's The absurd cost of finalizers in Go (via), which reports on a remarkably high cost of using a finalizer to insure that C memory is freed. Lemire's numbers aren't atypical; in my own testing in a different environment I found a rough factor of ten difference between directly calling C malloc() and free() and using a finalizer to call free().

The first reason for this increased overhead in Lemire's test case is perhaps somewhat surprising, which is that using a finalizer forces heap allocation, while Lemire's non-finalizer version does not. Suppose that you have:

func Allocate() *C.char {
  return C.allocate()
}

func Free(c *C.char) {
  C.free_allocated(c)
}

// in a _test.go file
func BenchmarkAllocate(b *testing.B) {
  for j := 0; j < b.N; j++ {
    p := Allocate()
    Free(p)
  }
}

Go 1.20 is smart enough to allocate 'p' on the Go stack, so while the C code is calling malloc() and free(), Go is not doing anything with its own memory system. The moment you call runtime.SetFinalizer() this changes; Go considers the object you're trying to finalize to escape, so it allocates it in the heap. Probably this often won't matter in real situations, because what you're finalizing is already going to be heap allocated.

(In Lemire's test code, you can see this if you use 'go test -benchmem -bench=Benchmark -run -'; some of the benchmarks will allocate nothing per invocation, and others will allocate one thing.)

Lemire tested with garbage collection (GC) turned off in the Go runtime and got similar results, so theorized that SetFinalizer() was the expensive portion. I constructed a synthetic test function that only set a finalizer without making any cgo calls, and this does seem to be the case. With Go's GC on in its normal state, over 50% of the runtime of benchmarking this function is in SetFinalizer(), mostly in an internal runtime function called runtime.addspecial(). There are some other surprises, though. In total, GC activity seems to be about 27% of the runtime, with about half of that being directly triggered by allocations and half happening in the background. Much of the GC time seems to be spent processing and running finalizers, even though the test's finalizer does nothing (21% of the total time). A surprisingly high percentage of the time is spent locking and unlocking things, with the Go profiler attributing 10% to 'runtime.lock2()' and 10% to 'runtime.unlock2()'.

What I take from this is that SetFinalizer() is probably not considered something that you should use heavily, and as a result it hasn't been heavily optimized. You can get a sense of this from the extensive documentation around its limitations and issues in the runtime.SetFinalizer() documentation; using it correctly is tricky, and correctly using anything with a finalizer attached is also tricky (see the discussion of the example with file descriptors).

PS: One of the effects of putting finalizers on objects is that the objects will take longer to be garbage collected (an unused object with a finalizer takes two GC cycles to collect, instead of one). This may affect how you structure objects and where you attach finalizers; you probably don't want to put a finalizer on a big object or on an object that will be directly embedded in one (since Go doesn't free sub-objects by themselves).

Sidebar: My finalizer-only test code

In case people want to run their own tests:

// Used by Lemire's other benchmarks
type Cstr struct {
  cpointer *C.char
}

// No C malloc, no finalizer code
func EmptyFinalizer() *Cstr {
  answer := &Cstr{}
  runtime.SetFinalizer(answer, func(c *Cstr) {})
  return answer
}

// in _test file
func BenchmarkEmptyFinalizer(b *testing.B) {
  for j := 0; j < b.N; j++ {
    EmptyFinalizer()
  }
}

I deliberately structured this to be as close to Lemire's other benchmark test functions as possible, hence its use of the Cstr type.


Comments on this page:

By sean at 2023-05-21 12:16:23:

Antoher point to consider is the size of the object the finalizer is set on https://go.dev/play/p/4JilDUXBTBg

 cpu: 12th Gen Intel(R) Core(TM) i7-1260P
 BenchmarkAllocate1-16      166008548       7.476 ns/op       8 B/op       1 allocs/op
 BenchmarkFinalize1-16        1790385       736.0 ns/op       8 B/op       1 allocs/op
 BenchmarkAllocate2-16      101285217       11.08 ns/op      16 B/op       1 allocs/op
 BenchmarkFinalize2-16        3129697       446.9 ns/op      16 B/op       1 allocs/op
 BenchmarkAllocate4-16       85007026       13.11 ns/op      32 B/op       1 allocs/op
 BenchmarkFinalize4-16        3808947       333.8 ns/op      32 B/op       1 allocs/op
 BenchmarkAllocate8-16       56715451       18.43 ns/op      64 B/op       1 allocs/op
 BenchmarkFinalize8-16        4851697       258.2 ns/op      64 B/op       1 allocs/op
 BenchmarkAllocate16-16      40862126       29.68 ns/op     128 B/op       1 allocs/op
 BenchmarkFinalize16-16       5685782       205.8 ns/op     128 B/op       1 allocs/op
 BenchmarkAllocate32-16      24007569       52.06 ns/op     256 B/op       1 allocs/op
 BenchmarkFinalize32-16       5279581       223.4 ns/op     256 B/op       1 allocs/op
 BenchmarkAllocate64-16      12752737       92.86 ns/op     512 B/op       1 allocs/op
 BenchmarkFinalize64-16       4242610       279.9 ns/op     512 B/op       1 allocs/op
 BenchmarkAllocate128-16      6978190       177.9 ns/op    1024 B/op       1 allocs/op
 BenchmarkFinalize128-16      3096054       385.0 ns/op    1024 B/op       1 allocs/op
 BenchmarkAllocate256-16      3311721       355.3 ns/op    2048 B/op       1 allocs/op
 BenchmarkFinalize256-16      2066324       589.0 ns/op    2048 B/op       1 allocs/op
 BenchmarkAllocate512-16      1675885       706.9 ns/op    4096 B/op       1 allocs/op
 BenchmarkFinalize512-16      1218241       972.7 ns/op    4096 B/op       1 allocs/op
 PASS
Written on 20 May 2023.
« The long life of Apache httpd 2.4
NFS with Kerberos and NFS without Kerberos are two quite different things »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sat May 20 22:45:40 2023
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.