2019-12-25
Some reasons for Go to not make system calls through the standard C library
One of the recent pieces of news in the Unix world is that as part
of its general security work, OpenBSD is moving towards only allowing
system calls to be made from the C library, not from any other code
(you can read about this in OpenBSD system call origin verification). Right now OpenBSD has an
exemption for the code of programs themselves, primarily because
Go generally makes system calls directly instead of by calling the
C library, but they would like to get rid of that. Other people are
not happy about Go making direct system calls; for example, on
Solaris and Illumos, the only officially supported method of making
system calls is also through the C library (although
Go does it itself on those operating systems).
(Update: On Illumos and Solaris, Go actually uses the platform C library to make system calls; I was wrong here.)
On the surface this makes Go sound unreasonable, and you might ask why it can't just make Unix system calls through the system's C library the way pretty much every other Unix language does. Although I don't know exactly why the Go developers chose to do it this way, there are reasons why you might want to avoid the C library in a language like Go, because the standard C library's Unix system call API is under-specified and awkward.
The obvious way that the C library API is under-specified for things like Go is the question of how much free stack space you need. C code (even threaded C code) traditionally allocates large or very large stacks, but Go wants to use very small stacks when it can, on the order of a few KB, in order to keep goroutines lightweight. The C library API makes no promises here, so if you want to be safe you need to call into it with larger stacks and even then you're guessing. The issue of how much stack space you need to call C library system calls has already been a problem for Go. Go solved this for now by increasing the stack size, but since the required stack size is not a documented part of the C library API, it may break in the future (on any Unix, not just Linux for calls to vDSOs).
(Unixes that strongly insist you go through the C library to make system calls generally reserve the right to have those 'system call' library functions do any amount of work behind your back, because the apparent API to system calls may not be the real kernel API. Indeed one reason for Unixes to force this is exactly so they can make changes in the kernel API without changing the 'system call' API that programs use. Such a change in internal implementation can of course cause unpredictable and undocumented changes in how much stack space the C library will demand and use during such function calls.)
For system calls, the most obvious awkward area of the C library
API is how the specifics of errors are returned in errno
, which
is nominally a global variable. Using a global variable was sort
of okay in the days before multi-threaded programs and wanting to
make system calls from multiple threads, but it's clearly a problem
now. Making errno
work in the modern world requires behind the
scenes magic in the C library, which generally means that you must
use the entire C runtime (and yes, C has a runtime) to do things
like set up thread local storage, create OS level threads so that
they have this thread local storage, and retrieve your thread's
errno
from its TLS. In the extreme, this may require you to use
the C library pthreads API to create any threads that will make
system calls, then carefully schedule goroutines that want to make
system calls onto those pthreads (likely with large stacks, because
of the C library API issues there). All of this is completely
unnecessary in the underlying kernel API, which already directly
provides the error code to you.
The C global errno
exists for historical compatibility and because
C has no easy way to return multiple values; the natural modern API
is Go's approach of returning the result and the errno, which is
intrinsically thread safe and has no pseudo-global variables. Requiring
all languages to go through the C library's normal Unix API for system
calls means constraining all languages to live with C's historical
baggage and limits.
(You could invent a new C library API for all of the system calls
that directly wrote the error number to a spot the caller provided,
which would make life much simpler, but no major Unix or C library
is so far proposing to do this. Everyone wants (or requires) you
to go through the traditional Unix API, errno
warts and all.)
Why udev may be trying to rename your VLAN interfaces to bad names
When I updated my office workstation to Fedora 30 back in August, I ran into a little issue:
It has been '0' days since systemd/udev blew up my networking. Fedora 30 systemd/udev attempts to rename VLAN devices to the interface's base name and fails spectacularly, causing the sys-subsystem*.device units to not be present. We hope you didn't depend on them! (I did.)
I filed this as Fedora bug #1741678, and just today I got a clue so that now I think I know why this happens.
The symptom of this problem is that during boot, your system will log things like:
systemd-udevd[914]: em-net5: Failed to rename network interface 4 from 'em-net5' to 'em0': Device or resource busy
As you might guess from the name I've given it here, em-net5 is a VLAN on em0. The name 'em0' itself is one that I assigned, because I don't like the network names that systemd-udevd would assign if left on its own (they are what I would call ugly, or at least tangled and long). The failure here prevents systemd from creating the sys-subsystem-net-devices-em-net5.device unit that it normally would (and then this had further consequences because of systemd's lack of good support for networks being ready).
I use networkd with static networking, so I set up the em0 name through a networkd .link file (as covered here). This looks like:
[Match] MACAddress=60:45:cb:a0:e8:dd [Link] Description=Onboard port MACAddressPolicy=persistent Name=em0
Based on what 'udevadm test' reports, it appears that when udevd is configuring the em-net5 VLAN, it (still) matches this .link file for the underlying device and applying things from it. My guess is that this is happening because VLANs and their underlying physical interfaces normally share MACs, and so the VLAN MAC matches the MAC here.
This appears to be a behavior change in the version of udev shipped
in Fedora 30. Before Fedora 30, systemd-udevd and networkd did not
match VLAN MACs against .link files; from Fedora 30 onward, it
appears to do so. To stop this, presumably one needs to limit your
.link files to only matching on physical interfaces, not VLANs, but
unfortunately this seems difficult to do. The systemd.link manpage
documents a 'Type=
' match, but while VLANs have a type that can
be used for this, native interfaces do not appear to (and there
doesn't seem to be a way to negate the match). There are various
hacks that could be committed here, but all of them are somewhat
unpleasant to me (such as specifying the kernel driver; if the
kernel's opinion of what driver to use for this hardware changes,
I am up a creek again).