CGo's Go string functions explained

August 31, 2015

As plenty of its documentation will tell you, cgo provides four functions to convert between Go and C types by making copies of the data. They are tersely explained in the CGo documentation; too tersely, in my opinion, because the documentation only covers certain things by implication and omits two very important glaring cautions. Because I made some mistakes here I'm going to write out a longer explanation.

The four functions are:

func C.CString(string) *C.char
func C.GoString(*C.char) string
func C.GoStringN(*C.char, C.int) string
func C.GoBytes(unsafe.Pointer, C.int) []byte

C.CString() is the equivalent of C's strdup() and copies your Go string to a C char * that you can pass to C functions, just as documented. The one annoying thing is that because of how Go and CGo types are defined, calling C.free will require a cast:

cs := C.CString("a string")
C.free(unsafe.Pointer(cs))

Note that Go strings may contain embedded 0 bytes and C strings may not. If your Go string contains one and you call C.CString(), C code will see your string truncated at that 0 byte. This is often not a concern, but sometimes text isn't guaranteed to not have null bytes.

C.GoString() is also the equivalent of strdup(), but for going the other way, from C strings to Go strings. You use it on struct fields and other things that are declared as C char *'s, aka *C.char in Go, and (as we'll see) pretty much nothing else.

C.GoStringN() is the equivalent of C's memmove(), not to any normal C string function. It copies the entire length of the C buffer into a Go string, and it pays no attention to null bytes. More exactly, it copies them too. If you have a struct field that is declared as, say, 'char field[64]' and you call C.GoStringN(&field, 64), the Go string you get will always be 64 characters long and will probably have a bunch of 0 bytes at the end.

(In my opinion this is a bug in cgo's documentation. It claims that GoStringN takes a C string as the argument, but it manifestly does not, as C strings are null-terminated and GoStringN does not stop at null bytes.)

C.GoBytes() is a version of C.GoStringN() that returns a []byte instead of a string. Since it doesn't claim to be taking a C string as the argument, it's clearer that it is simply a memory copy of the entire buffer.

If you are copying something that is not actually a null terminated C string but is instead a memory buffer with a size, C.GoStringN() is exactly what you want; it avoids the traditional C problem of dealing with 'strings' that aren't actually C strings. However, none of these functions are what you want if you are dealing with size-limited C strings in the form of struct fields declared as 'char field[N]'.

The traditional semantics of a fixed size string field in structs, fields that are declared as 'char field[N]' and described as holding a string, is that the string is null terminated if and only if there is room, ie if the string is at most N-1 characters long. If the string is exactly N characters long, it is not null terminated. This is a fruitful source of bugs even in C code and is not a good API, but it is an API that we are generally stuck with. Any time you see such a field and the documentation does not expressly tell you that the field contents are always null terminated, you have to assume that you have this sort of API.

Neither C.GoString() nor C.GoStringN() deal correctly with these fields. Using GoStringN() is the less wrong option; it will merely leave you with N-byte Go strings with plenty of trailing 0 bytes (which you may not notice for some time if you usually just print those fields out; yes, I've done this). Using the tempting GoString() is actively dangerous, because it internally does a strlen() on the argument; if the field lacks a terminating null byte, the strlen() will run away into memory beyond it. If you're lucky you will just wind up with some amount of trailing garbage in your Go string. If you're unlucky, your Go program will take a segmentation fault as strlen() hits unmapped memory.

(In general, trailing garbage in strings is the traditional sign that you have an unterminated C string somewhere.)

What you actually want is the Go equivalent of C's strndup(), which guarantees to copy no more than N bytes of memory but will stop before then if it finds a null byte. Here is my version of it, with no guarantees:

func strndup(cs *C.char, len int) string {
   s := C.GoStringN(cs, C.int(len))
   i := strings.IndexByte(s, 0)
   if i == -1 {
      return s
   }
   return C.GoString(cs)
}

This code does some extra work in order to minimize extra memory usage due to how Go strings can hold memory. You may want to take the alternate approach of returning a slice of the GoStringN() string. Really sophisticated code might decide which of the two options to use based on the difference between i and len.

Update: Ian Lance Taylor showed me the better version:

func strndup(cs *C.char, len int) string {
   return C.GoStringN(cs, C.int(C.strnlen(cs, C.size_t(len))))
}

Yes, that's a lot of casts. That's the combination of Go and CGo typing for you.

Written on 31 August 2015.
« Turning, well copying blobs of memory into Go structures
Thinking about the different models of supplying computing »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Aug 31 23:49:44 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.