2015-08-31
CGo's Go string functions explained
As plenty of its documentation will tell you, cgo provides four functions to convert between Go and C types by making copies of the data. They are tersely explained in the CGo documentation; too tersely, in my opinion, because the documentation only covers certain things by implication and omits two very important glaring cautions. Because I made some mistakes here I'm going to write out a longer explanation.
The four functions are:
func C.CString(string) *C.char func C.GoString(*C.char) string func C.GoStringN(*C.char, C.int) string func C.GoBytes(unsafe.Pointer, C.int) []byte
C.CString()
is the equivalent of C's strdup()
and copies your
Go string to a C char *
that you can pass to C functions, just as
documented. The one annoying thing is that because of how Go and CGo
types are defined, calling C.free
will require a cast:
cs := C.CString("a string") C.free(unsafe.Pointer(cs))
Note that Go strings may contain embedded 0 bytes and C strings may not. If your Go string contains one and you call C.CString(), C code will see your string truncated at that 0 byte. This is often not a concern, but sometimes text isn't guaranteed to not have null bytes.
C.GoString()
is also the equivalent of strdup()
, but for going
the other way, from C strings to Go strings. You use it on struct
fields and other things that are declared as C char *
's, aka
*C.char
in Go, and (as we'll see) pretty much nothing else.
C.GoStringN()
is the equivalent of C's memmove()
, not to any
normal C string function. It copies the entire length of the C
buffer into a Go string, and it pays no attention to null bytes.
More exactly, it copies them too. If you have a struct field that
is declared as, say, 'char field[64]
' and you call C.GoStringN(&field,
64)
, the Go string you get will always be 64 characters long and
will probably have a bunch of 0 bytes at the end.
(In my opinion this is a bug in cgo's documentation. It claims that GoStringN takes a C string as the argument, but it manifestly does not, as C strings are null-terminated and GoStringN does not stop at null bytes.)
C.GoBytes()
is a version of C.GoStringN()
that returns a []byte
instead of a string. Since it doesn't claim to be taking a C string
as the argument, it's clearer that it is simply a memory copy of
the entire buffer.
If you are copying something that is not actually a null terminated
C string but is instead a memory buffer with a size, C.GoStringN()
is exactly what you want; it avoids the traditional C problem of
dealing with 'strings' that aren't actually C strings. However, none of these functions are what
you want if you are dealing with size-limited C strings in the
form of struct fields declared as 'char field[N]
'.
The traditional semantics of a fixed size string field in struct
s,
fields that are declared as 'char field[N]
' and described as
holding a string, is that the string is null terminated if and only
if there is room, ie if the string is at most N-1 characters long.
If the string is exactly N characters long, it is not null terminated.
This is a fruitful source of bugs even in C code
and is not a good API, but it is an API that we are generally stuck
with. Any time you see such a field and the documentation does not
expressly tell you that the field contents are always null terminated,
you have to assume that you have this sort of API.
Neither C.GoString()
nor C.GoStringN()
deal correctly with these
fields. Using GoStringN() is the less wrong option; it will merely
leave you with N-byte Go strings with plenty of trailing 0 bytes
(which you may not notice for some time if you usually just print
those fields out; yes, I've done this). Using the tempting GoString()
is actively dangerous, because it internally does a strlen()
on
the argument; if the field lacks a terminating null byte, the
strlen()
will run away into memory beyond it. If you're lucky you
will just wind up with some amount of trailing garbage in your Go
string. If you're unlucky, your Go program will take a segmentation
fault as strlen()
hits unmapped memory.
(In general, trailing garbage in strings is the traditional sign that you have an unterminated C string somewhere.)
What you actually want is the Go equivalent of C's strndup()
,
which guarantees to copy no more than N bytes of memory but will
stop before then if it finds a null byte. Here is my version of it,
with no guarantees:
func strndup(cs *C.char, len int) string { s := C.GoStringN(cs, C.int(len)) i := strings.IndexByte(s, 0) if i == -1 { return s } return C.GoString(cs) }
This code does some extra work in order to minimize extra memory
usage due to how Go strings can hold memory.
You may want to take the alternate approach of returning a slice
of the GoStringN() string. Really sophisticated code might decide
which of the two options to use based on the difference between i
and len
.
Update: Ian Lance Taylor showed me the better version:
func strndup(cs *C.char, len int) string { return C.GoStringN(cs, C.int(C.strnlen(cs, C.size_t(len)))) }
Yes, that's a lot of casts. That's the combination of Go and CGo typing for you.
Turning, well copying blobs of memory into Go structures
As before, suppose (not entirely hypothetically) that you're writing a
package to connect Go up to something that will provide it with
blobs of memory that are actually C structs; these might be mmap()'d
files, information from a library, or whatever. Once you have a
compatible Go struct
, you still have to
get the data from a C struct (or raw memory) to the Go struct.
One way to do this is to manually write your own struct copy function
that does it field by field (eg 'io.Field = ks_io.field
' for
each field). As with defining the Go structs by hand, this is tedious
and potentially error prone. You can do it and you'll probably have
to if the C struct contains unions or other hard to deal with things,
but we'd like an easier approach. Fortunately there are two good
ones for two different cases. In both cases we will wind up copying
the C struct or the raw memory to a Go struct variable that is an
exact equivalent of the C struct (or at
least we hope it is).
The easy case is when we're dealing with a fixed struct
that we
have a known Go type for. Assuming that we have a C void *
pointer
to the original memory area called ks.ks_data
, we can adopt the
C programmer approach and write:
var io IO io = *((*IO)(ks.ks_data)) return &io
This casts ks.ks_data
to a pointer to an IO
struct and then
dereferences it to copy the struct itself into the Go variable we
made for this. Depending on the C type of ks_data
, you may need
to use the hammer of unsafe.Pointer()
here:
io = *((*IO)(unsafe.Pointer(ks.ks_data)))
At this point, some people will be tempted to skip the copying and
just return the 'casted-to-*IO' ks.ks_data
pointer. You don't
want to do this, because if you return a Go pointer to C data,
you're coupling Go and C memory management lifetimes. The C
memory must not be freed or reused for something else for as long
as Go retains at least one pointer to it, and there is no way for
you to find out when the last Go reference goes away so that you
can free the C memory. It's much simpler to treat 'C memory' as
completely disjoint from 'Go memory'; any time you want to move
some information across the boundary, you must copy it. With copying
we know we can free ks.ks_data
safely the moment the copy is
done and the Go runtime will handle the lifetime of the io
variable
for us.
The more difficult case is when we don't know what struct
s we're
dealing with; we're providing the access package, but it's the
callers who actually know the struct
s are. This situation might
come up in a package for accessing kernel stats, where drivers or
other kernel systems can export custom stats structs. Our access
package can provide specific support for known struct
s, but we
need an escape hatch for when the callers knows that some specific
kernel system is providing a 'struct whatever' and it wants to
retrieve that (probably into an identical Go struct created through
cgo
).
The C programmer approach to this problem is memmove()
. You can
write memmove()
in Go with sufficiently perverse use of the
unsafe
package, but you don't want to. Instead we can use the
reflect
package to create a generic version of the specific 'cast
and copy' code we used above. How to do this wasn't obvious to me
until I did a significant amount of flailing around with the package,
so I'm going to go through the logic of what we're doing in detail.
We'll start with our call signature:
func (k *KStat) CopyTo(ptri interface{}) error { ... }
CopyTo takes a pointer to a Go struct and copies our C memory in
ks.ks_data
into the struct. I'm going to omit the reflect
-based
code to check ptri
to make sure it's actually a pointer to a
suitable struct in the interests of space, but you shouldn't in
real code. Also, there are a whole raft of qualifications you're
going to want to impose on what types of fields that struct can
contain if you want to at least pretend that your package is somewhat
memory safe.
To actually do the copy, we first need to turn this ptri
interface
value into a reflect.Value
that is the destination struct itself:
ptr := reflect.ValueOf(ptri) dst := ptr.Elem()
We now need to cast ks.ks_data
to a Value with the type 'pointer to
dst's type'. This is most easily done by creating a new pointer of the
right type with the address taken from ks.ks_data
:
src := reflect.NewAt(dst.Type(), unsafe.Pointer(ks.ks_data))
This is the equivalent of 'src := ((*IO)(ks.ks_data))
' in the
type-specific version. Reflect.NewAt is there for doing just
this; its purpose is to create pointers for 'type X at address Y',
which is exactly the operation we need.
Having created this pointer, we then dereference it to copy the
data into dst
:
dst.Set(reflect.Indirect(src))
This is the equivalent of 'io = *src
' in the type-specific
version. We're done.
In my testing, this approach is surprisingly robust; it will deal
with even structs that I didn't expect it to (such as ones with
unexported fields). But you probably don't want to count on that;
it's safest to give CopyTo()
straightforward structs with only
exported fields.
On the whole I'm both happy and pleasantly surprised by how easy
it turned out to be to use the reflect package here; I expected it
to require a much more involved and bureaucratic process. Getting
to this final form involved a lot of missteps and unnecessarily
complicated approaches, but the final form itself is about as minimal
as I could expect. A lot of this is due to the existence of
reflect.NewAt()
, but there's also that Value.Set()
works fine
even on complex and nested types.
(Note that while you could use the reflect-based version even for the first, fixed struct type case, my understanding is that the reflect package has not insignificant overheads. By contrast the hard coded fixed struct type code is about as minimal and low overhead as you can get; it should normally compile down to basically a memory copy.)
Sidebar: preserving Go memory safety here
I'm not fully confident that I have this right, but I think that to
preserve memory safety in the face of this memory copying you must
insure that the target struct
type does not contain any embedded
pointers, either explicit ones or ones implicitly embedded into types
like maps, chans, interfaces, strings, slices, and so on. Fixed-size
arrays are safe because in Go those are just fixed size blocks of
memory.
If you copy a C struct containing pointers into a Go struct containing
pointers, what you're doing is the equivalent of directly returning
the 'casted-to-*IO' ks.ks_data
pointer. You've allowed the
creation of a Go object that points to C memory and you now have
the same C and Go memory lifetime issues. And if some of the pointers
are invalid or point to garbage memory, not only is normal Go code
at risk of bad things but it's possible that the Go garbage collector
will wind up trying to dereference them and take a fault.
(This makes it impossible to easily copy certain sorts of C structures into Go structures. Fortunately such structures rarely appear in this sort of C API because they often raise awkward memory lifetime issues even in C.)