CGo's Go string functions explained
As plenty of its documentation will tell you, cgo provides four functions to convert between Go and C types by making copies of the data. They are tersely explained in the CGo documentation; too tersely, in my opinion, because the documentation only covers certain things by implication and omits two very important glaring cautions. Because I made some mistakes here I'm going to write out a longer explanation.
The four functions are:
func C.CString(string) *C.char func C.GoString(*C.char) string func C.GoStringN(*C.char, C.int) string func C.GoBytes(unsafe.Pointer, C.int) []byte
C.CString()
is the equivalent of C's strdup()
and copies your
Go string to a C char *
that you can pass to C functions, just as
documented. The one annoying thing is that because of how Go and CGo
types are defined, calling C.free
will require a cast:
cs := C.CString("a string") C.free(unsafe.Pointer(cs))
Note that Go strings may contain embedded 0 bytes and C strings may not. If your Go string contains one and you call C.CString(), C code will see your string truncated at that 0 byte. This is often not a concern, but sometimes text isn't guaranteed to not have null bytes.
C.GoString()
is also the equivalent of strdup()
, but for going
the other way, from C strings to Go strings. You use it on struct
fields and other things that are declared as C char *
's, aka
*C.char
in Go, and (as we'll see) pretty much nothing else.
C.GoStringN()
is the equivalent of C's memmove()
, not to any
normal C string function. It copies the entire length of the C
buffer into a Go string, and it pays no attention to null bytes.
More exactly, it copies them too. If you have a struct field that
is declared as, say, 'char field[64]
' and you call C.GoStringN(&field,
64)
, the Go string you get will always be 64 characters long and
will probably have a bunch of 0 bytes at the end.
(In my opinion this is a bug in cgo's documentation. It claims that GoStringN takes a C string as the argument, but it manifestly does not, as C strings are null-terminated and GoStringN does not stop at null bytes.)
C.GoBytes()
is a version of C.GoStringN()
that returns a []byte
instead of a string. Since it doesn't claim to be taking a C string
as the argument, it's clearer that it is simply a memory copy of
the entire buffer.
If you are copying something that is not actually a null terminated
C string but is instead a memory buffer with a size, C.GoStringN()
is exactly what you want; it avoids the traditional C problem of
dealing with 'strings' that aren't actually C strings. However, none of these functions are what
you want if you are dealing with size-limited C strings in the
form of struct fields declared as 'char field[N]
'.
The traditional semantics of a fixed size string field in struct
s,
fields that are declared as 'char field[N]
' and described as
holding a string, is that the string is null terminated if and only
if there is room, ie if the string is at most N-1 characters long.
If the string is exactly N characters long, it is not null terminated.
This is a fruitful source of bugs even in C code
and is not a good API, but it is an API that we are generally stuck
with. Any time you see such a field and the documentation does not
expressly tell you that the field contents are always null terminated,
you have to assume that you have this sort of API.
Neither C.GoString()
nor C.GoStringN()
deal correctly with these
fields. Using GoStringN() is the less wrong option; it will merely
leave you with N-byte Go strings with plenty of trailing 0 bytes
(which you may not notice for some time if you usually just print
those fields out; yes, I've done this). Using the tempting GoString()
is actively dangerous, because it internally does a strlen()
on
the argument; if the field lacks a terminating null byte, the
strlen()
will run away into memory beyond it. If you're lucky you
will just wind up with some amount of trailing garbage in your Go
string. If you're unlucky, your Go program will take a segmentation
fault as strlen()
hits unmapped memory.
(In general, trailing garbage in strings is the traditional sign that you have an unterminated C string somewhere.)
What you actually want is the Go equivalent of C's strndup()
,
which guarantees to copy no more than N bytes of memory but will
stop before then if it finds a null byte. Here is my version of it,
with no guarantees:
func strndup(cs *C.char, len int) string { s := C.GoStringN(cs, C.int(len)) i := strings.IndexByte(s, 0) if i == -1 { return s } return C.GoString(cs) }
This code does some extra work in order to minimize extra memory
usage due to how Go strings can hold memory.
You may want to take the alternate approach of returning a slice
of the GoStringN() string. Really sophisticated code might decide
which of the two options to use based on the difference between i
and len
.
Update: Ian Lance Taylor showed me the better version:
func strndup(cs *C.char, len int) string { return C.GoStringN(cs, C.int(C.strnlen(cs, C.size_t(len)))) }
Yes, that's a lot of casts. That's the combination of Go and CGo typing for you.
|
|