I should remember that sometimes C is a perfectly good option

February 5, 2018

Recently I found myself needing a Linux command that reported how many CPUs are available for you to use. On Linux, the official way to do this is to call sched_getaffinity and count how many 1 bits are set in the CPU mask that you get back. My default tool for this sort of thing these days is Go and I found some convenient support for this (in the golang.org/x/sys/unix package), so I wrote the obvious Go program:

package main
import (
    "fmt"
    "os"
    "golang.org/x/sys/unix"
)

func main() {
    var cpuset unix.CPUSet
    err := unix.SchedGetaffinity(0, &cpuset)
    if err != nil {
        fmt.Printf("numcpus: cannot get affinity: %s\n", err)
        os.Exit(1)
    }
    fmt.Printf("%d\n", cpuset.Count())
}

This compiled, ran on most of our machines, and then reported an 'invalid argument' error on some of them. After staring at strace output for a while, I decided that I needed to write a C version of this so I understood exactly what it was doing and what I was seeing. I was expecting this to be annoying (because it would involve writing code to count bits), but it turns out that there's a set of macros for this so the code is just:

#define _GNU_SOURCE
#include    <sched.h>
#include    <unistd.h>
#include    <stdio.h>
#include    <stdlib.h>

#define MAXCPUS 0x400

int main(int argc, char **argv) {
    cpu_set_t *cpuset;
    cpuset = CPU_ALLOC(MAXCPUS);

    if (sched_getaffinity(0, CPU_ALLOC_SIZE(MAXCPUS), cpuset) < 0) {
        fprintf(stderr, "numcpus: sched_getaffinity: %m\n");
        exit(1);
    }
    printf("%d\n", CPU_COUNT(cpuset));
}

(I think I have an unnecessary include file in there but I don't care. I spray standard include files into my C programs until the compiler stops complaining. Also, I'm using a convenient glibc printf() extension since I'm writing for Linux.)

This compiled, worked, and demonstrated that what I was seeing was indeed a bug in the x/sys/unix package. I don't blame Go for this, by the way. Bugs can happen anywhere, and they're generally more likely to happen in my code than in library code (that's one reason I like to use library code whenever possible).

The Go version and the C version are roughly the same number of lines and wound up being roughly as complicated to write (although the C version fails to check for an out of memory condition that's extremely unlikely to ever happen).

The Go version builds to a 64-bit Linux binary that is 1.1 Mbytes on disk. The C version builds to a 64-bit Linux binary that is 5 Kbytes on disk.

(This is not particularly Go's fault, lest people think that I'm picking on it. The Go binary is statically linked, for example, while the C version is dynamically linked; statically linking the C version results in an 892 Kbyte binary. Of course, in practice it's a lot easier to dynamically link and run a program written in C than in anything else because glibc is so pervasive.)

When I started writing this entry, I was going to say that what I took from this is that sometimes C is the right answer. Perhaps it is, but that's too strong a conclusion for this example. Yes, the C version is the same size in source code and much smaller as a binary (and that large Go binary does sort of offend my old time Unix soul). But if the Go program had worked I wouldn't have cared enough about its size to write a C version, and if the CPU_SET macros didn't exist with exactly what I needed, the C version would certainly have been more annoying to write. And there is merit in focusing on a small set of tools that you like and know pretty well, even if they're not the ideal fit for every situation.

But still. There is merit in remembering that C exists and is perfectly useful and many things, especially low level operating system things, are probably quite direct to do in C. I could probably write more C than I do, and sometimes it might be no more work than doing it in another language. And I'd get small binaries, which a part of me cares about.

(At the same time, these days I generally find C to be annoying. It forces me to care about things that I mostly don't want to care about any more, like memory handling and making sure that I'm not going to blow my foot off.)

PS: I'm a little bit surprised and depressed that the statically linked C program is so close to the Go program in size, because the Go program includes a lot of complex runtime support in that 1.1 Mbytes (including an entire garbage collector). The C program has no such excuses.

Written on 05 February 2018.
« A surprise in how ZFS grows a file's record size (at least for me)
Consumer SSDs and their nominal physical block sizes, now and in the future »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Feb 5 23:34:16 2018
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.