Wandering Thoughts archives

2015-08-31

CGo's Go string functions explained

As plenty of its documentation will tell you, cgo provides four functions to convert between Go and C types by making copies of the data. They are tersely explained in the CGo documentation; too tersely, in my opinion, because the documentation only covers certain things by implication and omits two very important glaring cautions. Because I made some mistakes here I'm going to write out a longer explanation.

The four functions are:

func C.CString(string) *C.char
func C.GoString(*C.char) string
func C.GoStringN(*C.char, C.int) string
func C.GoBytes(unsafe.Pointer, C.int) []byte

C.CString() is the equivalent of C's strdup() and copies your Go string to a C char * that you can pass to C functions, just as documented. The one annoying thing is that because of how Go and CGo types are defined, calling C.free will require a cast:

cs := C.CString("a string")
C.free(unsafe.Pointer(cs))

Note that Go strings may contain embedded 0 bytes and C strings may not. If your Go string contains one and you call C.CString(), C code will see your string truncated at that 0 byte. This is often not a concern, but sometimes text isn't guaranteed to not have null bytes.

C.GoString() is also the equivalent of strdup(), but for going the other way, from C strings to Go strings. You use it on struct fields and other things that are declared as C char *'s, aka *C.char in Go, and (as we'll see) pretty much nothing else.

C.GoStringN() is the equivalent of C's memmove(), not to any normal C string function. It copies the entire length of the C buffer into a Go string, and it pays no attention to null bytes. More exactly, it copies them too. If you have a struct field that is declared as, say, 'char field[64]' and you call C.GoStringN(&field, 64), the Go string you get will always be 64 characters long and will probably have a bunch of 0 bytes at the end.

(In my opinion this is a bug in cgo's documentation. It claims that GoStringN takes a C string as the argument, but it manifestly does not, as C strings are null-terminated and GoStringN does not stop at null bytes.)

C.GoBytes() is a version of C.GoStringN() that returns a []byte instead of a string. Since it doesn't claim to be taking a C string as the argument, it's clearer that it is simply a memory copy of the entire buffer.

If you are copying something that is not actually a null terminated C string but is instead a memory buffer with a size, C.GoStringN() is exactly what you want; it avoids the traditional C problem of dealing with 'strings' that aren't actually C strings. However, none of these functions are what you want if you are dealing with size-limited C strings in the form of struct fields declared as 'char field[N]'.

The traditional semantics of a fixed size string field in structs, fields that are declared as 'char field[N]' and described as holding a string, is that the string is null terminated if and only if there is room, ie if the string is at most N-1 characters long. If the string is exactly N characters long, it is not null terminated. This is a fruitful source of bugs even in C code and is not a good API, but it is an API that we are generally stuck with. Any time you see such a field and the documentation does not expressly tell you that the field contents are always null terminated, you have to assume that you have this sort of API.

Neither C.GoString() nor C.GoStringN() deal correctly with these fields. Using GoStringN() is the less wrong option; it will merely leave you with N-byte Go strings with plenty of trailing 0 bytes (which you may not notice for some time if you usually just print those fields out; yes, I've done this). Using the tempting GoString() is actively dangerous, because it internally does a strlen() on the argument; if the field lacks a terminating null byte, the strlen() will run away into memory beyond it. If you're lucky you will just wind up with some amount of trailing garbage in your Go string. If you're unlucky, your Go program will take a segmentation fault as strlen() hits unmapped memory.

(In general, trailing garbage in strings is the traditional sign that you have an unterminated C string somewhere.)

What you actually want is the Go equivalent of C's strndup(), which guarantees to copy no more than N bytes of memory but will stop before then if it finds a null byte. Here is my version of it, with no guarantees:

func strndup(cs *C.char, len int) string {
   s := C.GoStringN(cs, C.int(len))
   i := strings.IndexByte(s, 0)
   if i == -1 {
      return s
   }
   return C.GoString(cs)
}

This code does some extra work in order to minimize extra memory usage due to how Go strings can hold memory. You may want to take the alternate approach of returning a slice of the GoStringN() string. Really sophisticated code might decide which of the two options to use based on the difference between i and len.

Update: Ian Lance Taylor showed me the better version:

func strndup(cs *C.char, len int) string {
   return C.GoStringN(cs, C.int(C.strnlen(cs, C.size_t(len))))
}

Yes, that's a lot of casts. That's the combination of Go and CGo typing for you.

programming/GoCGoStringFunctions written at 23:49:44; Add Comment

Turning, well copying blobs of memory into Go structures

As before, suppose (not entirely hypothetically) that you're writing a package to connect Go up to something that will provide it with blobs of memory that are actually C structs; these might be mmap()'d files, information from a library, or whatever. Once you have a compatible Go struct, you still have to get the data from a C struct (or raw memory) to the Go struct.

One way to do this is to manually write your own struct copy function that does it field by field (eg 'io.Field = ks_io.field' for each field). As with defining the Go structs by hand, this is tedious and potentially error prone. You can do it and you'll probably have to if the C struct contains unions or other hard to deal with things, but we'd like an easier approach. Fortunately there are two good ones for two different cases. In both cases we will wind up copying the C struct or the raw memory to a Go struct variable that is an exact equivalent of the C struct (or at least we hope it is).

The easy case is when we're dealing with a fixed struct that we have a known Go type for. Assuming that we have a C void * pointer to the original memory area called ks.ks_data, we can adopt the C programmer approach and write:

var io IO
io = *((*IO)(ks.ks_data))
return &io

This casts ks.ks_data to a pointer to an IO struct and then dereferences it to copy the struct itself into the Go variable we made for this. Depending on the C type of ks_data, you may need to use the hammer of unsafe.Pointer() here:

io = *((*IO)(unsafe.Pointer(ks.ks_data)))

At this point, some people will be tempted to skip the copying and just return the 'casted-to-*IO' ks.ks_data pointer. You don't want to do this, because if you return a Go pointer to C data, you're coupling Go and C memory management lifetimes. The C memory must not be freed or reused for something else for as long as Go retains at least one pointer to it, and there is no way for you to find out when the last Go reference goes away so that you can free the C memory. It's much simpler to treat 'C memory' as completely disjoint from 'Go memory'; any time you want to move some information across the boundary, you must copy it. With copying we know we can free ks.ks_data safely the moment the copy is done and the Go runtime will handle the lifetime of the io variable for us.

The more difficult case is when we don't know what structs we're dealing with; we're providing the access package, but it's the callers who actually know the structs are. This situation might come up in a package for accessing kernel stats, where drivers or other kernel systems can export custom stats structs. Our access package can provide specific support for known structs, but we need an escape hatch for when the callers knows that some specific kernel system is providing a 'struct whatever' and it wants to retrieve that (probably into an identical Go struct created through cgo).

The C programmer approach to this problem is memmove(). You can write memmove() in Go with sufficiently perverse use of the unsafe package, but you don't want to. Instead we can use the reflect package to create a generic version of the specific 'cast and copy' code we used above. How to do this wasn't obvious to me until I did a significant amount of flailing around with the package, so I'm going to go through the logic of what we're doing in detail.

We'll start with our call signature:

func (k *KStat) CopyTo(ptri interface{}) error { ... }

CopyTo takes a pointer to a Go struct and copies our C memory in ks.ks_data into the struct. I'm going to omit the reflect-based code to check ptri to make sure it's actually a pointer to a suitable struct in the interests of space, but you shouldn't in real code. Also, there are a whole raft of qualifications you're going to want to impose on what types of fields that struct can contain if you want to at least pretend that your package is somewhat memory safe.

To actually do the copy, we first need to turn this ptri interface value into a reflect.Value that is the destination struct itself:

ptr := reflect.ValueOf(ptri)
dst := ptr.Elem()

We now need to cast ks.ks_data to a Value with the type 'pointer to dst's type'. This is most easily done by creating a new pointer of the right type with the address taken from ks.ks_data:

src := reflect.NewAt(dst.Type(), unsafe.Pointer(ks.ks_data))

This is the equivalent of 'src := ((*IO)(ks.ks_data))' in the type-specific version. Reflect.NewAt is there for doing just this; its purpose is to create pointers for 'type X at address Y', which is exactly the operation we need.

Having created this pointer, we then dereference it to copy the data into dst:

dst.Set(reflect.Indirect(src))

This is the equivalent of 'io = *src' in the type-specific version. We're done.

In my testing, this approach is surprisingly robust; it will deal with even structs that I didn't expect it to (such as ones with unexported fields). But you probably don't want to count on that; it's safest to give CopyTo() straightforward structs with only exported fields.

On the whole I'm both happy and pleasantly surprised by how easy it turned out to be to use the reflect package here; I expected it to require a much more involved and bureaucratic process. Getting to this final form involved a lot of missteps and unnecessarily complicated approaches, but the final form itself is about as minimal as I could expect. A lot of this is due to the existence of reflect.NewAt(), but there's also that Value.Set() works fine even on complex and nested types.

(Note that while you could use the reflect-based version even for the first, fixed struct type case, my understanding is that the reflect package has not insignificant overheads. By contrast the hard coded fixed struct type code is about as minimal and low overhead as you can get; it should normally compile down to basically a memory copy.)

Sidebar: preserving Go memory safety here

I'm not fully confident that I have this right, but I think that to preserve memory safety in the face of this memory copying you must insure that the target struct type does not contain any embedded pointers, either explicit ones or ones implicitly embedded into types like maps, chans, interfaces, strings, slices, and so on. Fixed-size arrays are safe because in Go those are just fixed size blocks of memory.

If you copy a C struct containing pointers into a Go struct containing pointers, what you're doing is the equivalent of directly returning the 'casted-to-*IO' ks.ks_data pointer. You've allowed the creation of a Go object that points to C memory and you now have the same C and Go memory lifetime issues. And if some of the pointers are invalid or point to garbage memory, not only is normal Go code at risk of bad things but it's possible that the Go garbage collector will wind up trying to dereference them and take a fault.

(This makes it impossible to easily copy certain sorts of C structures into Go structures. Fortunately such structures rarely appear in this sort of C API because they often raise awkward memory lifetime issues even in C.)

programming/GoMemoryToStructures written at 02:50:28; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.