2018-02-05
I should remember that sometimes C is a perfectly good option
Recently I found myself needing a Linux command that reported how
many CPUs are available for you to use. On Linux, the official way
to do this is to call sched_getaffinity
and
count how many 1 bits are set in the CPU mask that you get back.
My default tool for this sort of thing these days is Go and I found
some convenient support for this (in the golang.org/x/sys/unix
package), so I wrote the
obvious Go program:
package main
import ( "fmt" "os" "golang.org/x/sys/unix" ) func main() { var cpuset unix.CPUSet err := unix.SchedGetaffinity(0, &cpuset) if err != nil { fmt.Printf("numcpus: cannot get affinity: %s\n", err) os.Exit(1) } fmt.Printf("%d\n", cpuset.Count()) }
This compiled, ran on most of our machines, and then reported an
'invalid argument' error on some of them. After staring at strace
output for a while, I decided that I needed to write a C version
of this so I understood exactly what it was doing and what I was
seeing. I was expecting this to be annoying (because it would involve
writing code to count bits), but it turns out that there's a set
of macros for this
so the code is just:
#define _GNU_SOURCE #include <sched.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #define MAXCPUS 0x400 int main(int argc, char **argv) { cpu_set_t *cpuset; cpuset = CPU_ALLOC(MAXCPUS); if (sched_getaffinity(0, CPU_ALLOC_SIZE(MAXCPUS), cpuset) < 0) { fprintf(stderr, "numcpus: sched_getaffinity: %m\n"); exit(1); } printf("%d\n", CPU_COUNT(cpuset)); }
(I think I have an unnecessary include file in there but I don't
care. I spray standard include files into my C programs until the
compiler stops complaining. Also, I'm using a convenient glibc
printf()
extension since I'm writing for Linux.)
This compiled, worked, and demonstrated that what I was seeing was indeed a bug in the x/sys/unix package. I don't blame Go for this, by the way. Bugs can happen anywhere, and they're generally more likely to happen in my code than in library code (that's one reason I like to use library code whenever possible).
The Go version and the C version are roughly the same number of lines and wound up being roughly as complicated to write (although the C version fails to check for an out of memory condition that's extremely unlikely to ever happen).
The Go version builds to a 64-bit Linux binary that is 1.1 Mbytes on disk. The C version builds to a 64-bit Linux binary that is 5 Kbytes on disk.
(This is not particularly Go's fault, lest people think that I'm picking on it. The Go binary is statically linked, for example, while the C version is dynamically linked; statically linking the C version results in an 892 Kbyte binary. Of course, in practice it's a lot easier to dynamically link and run a program written in C than in anything else because glibc is so pervasive.)
When I started writing this entry, I was going to say that what I
took from this is that sometimes C is the right answer. Perhaps it
is, but that's too strong a conclusion for this example. Yes, the
C version is the same size in source code and much smaller as a
binary (and that large Go binary does sort of offend my old time
Unix soul). But if the Go program had worked I wouldn't have cared
enough about its size to write a C version, and if the CPU_SET
macros didn't exist with exactly what I needed, the C version would
certainly have been more annoying to write. And there is merit in
focusing on a small set of tools that you like and know pretty well,
even if they're not the ideal fit for every situation.
But still. There is merit in remembering that C exists and is perfectly useful and many things, especially low level operating system things, are probably quite direct to do in C. I could probably write more C than I do, and sometimes it might be no more work than doing it in another language. And I'd get small binaries, which a part of me cares about.
(At the same time, these days I generally find C to be annoying. It forces me to care about things that I mostly don't want to care about any more, like memory handling and making sure that I'm not going to blow my foot off.)
PS: I'm a little bit surprised and depressed that the statically linked C program is so close to the Go program in size, because the Go program includes a lot of complex runtime support in that 1.1 Mbytes (including an entire garbage collector). The C program has no such excuses.