Wandering Thoughts

2024-12-10

My wish for VFS or filesystem level cgroup (v2) IO limits

Over on the Fediverse, I wished for better IO limits than cgroup (v2) has today:

I wish Linux cgroups (v2 of course) had an option/interface that limited *filesystem* IO that you could do, read and/or write. They have 'block IO' limits but these are often ineffective for an assortment of reasons, including that you're not doing block IO (hi, NFS) or that the underlying filesystem and storage stack doesn't support them (... many). A VFS level limit would be nominally simple and very useful.

Cgroup(s) v2 have an IO controller, that has both priorities (which only work in limited circumstances) and absolute limits, which are applied on a per disk device basis and so appear to have some serious limitations. Based on what appears in the io.stat file, you might be able to limit bandwidth to a software RAID device, but you can't do it to ZFS filesystems (which is important to us) or to NFS filesystems, since the latter don't have 'block IO'.

In theory, this could be worked around if cgroup(s) v2 also had a controller that operated at the higher level of Linux's VFS, or perhaps at the level of individual filesystems, where it would apply to things like read(), write(), and IO performed through mmap()'d files. Since all filesystems go through the VFS, these limits would naturally apply no matter what the underlying filesystem was or the storage stack it was operating on top of.

As covered in /proc/pid/mountinfo(5), filesystems, well, mounts, do have identifying numbers that could be used in the same way as the block device numbers used by the IO controller, in order to implement filesystem specific limits. But I'd be happy with an overall limit, and in fact I'd like one even if you could set per-filesystem limits too.

(The current memory and IO controllers cooperate to create writeback limits, but a VFS level write bandwidth limit might be easier and more direct.)

However, I suspect that even a general VFS wide set of limits will never be implemented in cgroup v2, for two reasons. First, such a limit only cleanly applies to direct read and write IO involving files; it's at best awkward to extended it to, for example, reading directories, or worse, doing 'metadata' operations like creating, renaming, and deleting files (all of which can trigger various amounts of IO), or stat()'ing things. Second, I suspect that there would be implementation complexities in applying this to memory mapped files, although maybe you could put the relevant process (thread) to sleep for a while in page fault handling in order to implement rate limits.

(The cgroup v2 people might also consider VFS level limits to be the wrong way to go about things, but then I don't know how they expect people to rate limit networked filesystems. As far as I can tell, there is currently no network controller to rate limit overall network traffic, and it would likely need cooperation from the NFS client implementation anyway.)

linux/CgroupVFSIORatelimitWish written at 23:02:50;

2024-12-09

Maybe we should explicitly schedule rebooting our fleet every so often

We just got through a downtime where we rebooted basically everything in our fleet, including things like firewalls. Doing such a reboot around this time of year is somewhat of a tradition for us, since we have the university's winter break coming up and in the past some of our machines have had problems that seem to have been related to being up for 'too long'.

Generally we don't like to reboot our machines, because it's disruptive to our users. We're in an unusual sysadmin environment where people directly log in to or use many of the individual servers, so when one of them goes through a reboot, it's definitely noticed. There are behind the scenes machines that we can reboot without users particularly noticing, and some of our machines are sort of generic and could be rebooted on a rolling basis, but not our login servers, our general compute servers, our IMAP server, our heavily used general purpose web server, and so on. So our default is to not reboot our machines unless we have to.

The problem with defaults is that it's very easy to go with them. When the default is to not reboot your machines, this can result in machines that haven't been rebooted in a significant amount of time (and with it, haven't had work done on them that would require a reboot). When we were considering this December's round of precautionary, pre-break rebooting, we realized that this had happened to us. I'm not going to say just how long many of our machines had gone without a reboot, but it was rather long, long enough to feel alarming for various reasons.

We're not going to change our default of not rebooting things, but one way we could work within it is to decide in advance on a schedule for reboots. For example, we could decide that we'll reboot all of our fleet at least three times a year, since that conveniently fits into the university's teaching schedules (we're on the research side, but professors teaching courses may have various things on our machines). We probably wouldn't schedule the precise timing of this mass reboot in advance, but at least having it broadly scheduled (for example, 'we're rebooting everything around the start of May') might get us to do it reliably, rather than just drifting with our default.

sysadmin/SchedulingRebootsThought written at 23:08:53;

2024-12-08

Unix's buffered IO in assembly and in C

Recently on the Fediverse, I said something related to Unix's pre-V7 situation with buffered IO:

[...]

(I think the V1 approach is right for an assembly based minimal OS, while the stdio approach kind of wants malloc() and friends.)

The V1 approach, as documented in its putc.3 and getw.3 manual pages, is that the caller to the buffered IO routines supplies the data area used for buffering, and the library functions merely initialize it and later use it. How you get the data area is up to you and your program; you might, for example, simply have a static block of memory in your BSS segment. You can dynamically allocate this area if you want to, but you don't have to. The V2 and later putchar have a similar approach but this time they contain a static buffer area and you just have to do a bit of initialization (possibly putchar was in V1 too, I don't know for sure).

Stdio of course has a completely different API. In stdio, you don't provide the data area; instead, stdio provides you an opaque reference (a 'FILE *') to the information and buffers it maintains internally. This is an interface that definitely wants some degree of dynamic memory allocation, for example for the actual buffers themselves, and in modern usage most of the FILE objects will be dynamically allocated too.

(The V7 stdio implementation had a fixed set of FILE structs and so would error out if you used too many of them. However, it did use malloc() for the buffer associated with them, in filbuf.c and flsbuf.c.)

You can certainly do dynamic memory allocation in assembly, but I think it's much more natural in C, and certainly the C standard library is more heavyweight than the relatively small and minimal assembly language stuff early Unix programs (written in assembly) seem to have required. So I think it makes a lot of sense that Unix started with a buffering approach where the caller supplies the buffer (and probably doesn't dynamically allocate it), then moved to one where the library does at least some allocation and supplies the buffer (and other data) itself.

unix/BufferedIOBeforeMalloc written at 21:44:13;

2024-12-07

PCIe cards we use and have used in our servers

In a comment on my entry on how common (desktop) motherboards are supporting more M.2 NVMe slots but fewer PCIe cards, jmassey was curious about what PCIe cards we needed and used. This is a good and interesting question, especially since some number of our 'servers' are actually built using desktop motherboards for various reasons (for example, a certain number of the GPU nodes in our SLURM cluster, and some of our older compute servers, which we put together ourselves using early generation AMD Threadrippers and desktop motherboards for them).

Today, we have three dominant patterns of PCIe cards. Our SLURM GPU nodes obviously have a GPU card (x16 PCIe lanes) and we've added a single port 10G-T card (which I believe are all PCIe x4) so they can pull data from our fileservers as fast as possible. Most of our firewalls have an extra dual-port 10G card (mostly 10G-T but a few use SFPs). And a number of machines have dual-port 1G cards because they need to be on more networks; our current stock of these cards are physically x4 PCIe, although I haven't looked to see if they use all the lanes.

(We also have single-port 1G cards lying around that sometimes get used in various machines; these are x1 cards. The dual-port 10G cards are probably some mix of x4 and x8, since online checks say they come in both varieties. We have and use a few quad-port 1G cards for semi-exotic situations, but I'm not sure how many PCIe lanes they want, physically or otherwise. In theory they could reasonably be x4, since a single 1G is fine at x1.)

In the past, one generation of our fileserver setup had some machines that needed to use PCIe SAS controller in order to be able to talk to all of the drives in their chassis, and I believe these cards were PCIe x8; these machines also used a dual 10G-T card. The current generation handles all of their drives through motherboard controllers, but we might need to move back to cards in future hardware configurations (depending on what the available server motherboards handle on the motherboard). The good news, for fileservers, is that modern server motherboards increasingly have at least one onboard 10G port. But in a worst case situation, a large fileserver might need two SAS controller cards and a 10G card.

It's possible that we'll want to add NVMe drives to some servers (parts of our backup system may be limited by SATA write and read speeds today). Since I don't believe any of our current servers support PCIe bifurcation, this would require one or two PCIe x4 cards and slots (two if we want to mirror this fast storage, one if we decide we don't care). Such a server would likely also want 10G; if it didn't have a motherboard 10G port, that would require another x4 card (or possibly a dual-port 10G card at x8).

The good news for us is that servers tend to make all of their available slots be physically large (generally large enough for x8 cards, and maybe even x16 these days), so you can fit in all these cards even if some of them don't get all the PCIe lanes they'd like. And modern server CPUs are also coming with more and more PCIe lanes, so probably we can actually drive many of those slots at their full width.

(I was going to say that modern server motherboards mostly don't design in M.2 slots that reduce the available PCIe lanes, but that seems to depend on what vendor you look at. A random sampling of Supermicro server motherboards suggests that two M.2 slots are not uncommon, while our Dell R350s have none.)

sysadmin/ServerPCIeCardsWeUse written at 22:00:00;

2024-12-06

Common motherboards are supporting more and more M.2 NVMe drive slots

Back at the start of 2020, I wondered if common (x86 desktop) motherboards would ever have very many M.2 NVMe drive slots, where by 'very many' I meant four or so, which even back then was a common number of SATA ports for desktop motherboards to provide. At the time I thought the answer was probably no. As I recently discovered from investigating a related issue, I was wrong, and it's now fairly straightforward to find x86 desktop motherboards that have as many as four M.2 NVMe slots (although not all four may be able to run at x4 PCIe lanes, especially if you have things like a GPU).

For example, right now it's relatively easy to find a page full of AMD AM5-based motherboards that have four M.2 NVMe slots. Most of these seem to be based on the high end X series AMD chipsets (such as the X670 or the X870, but I found a few that were based on the B650 chipset. On the Intel side, should you still be interested in an Intel CPU in your desktop at this point, there's also a number of them based primarily on the Z790 chipset (and some the older Z690). There's even a B760 based motherboard with four M.2 NVMe slots (although two of them are only x1 lanes and PCIe 3.0), and an H770 based one that manages to (theoretically) support all four M.2 slots at x4 lanes.

One of the things that I think has happened on the way to this large supply of M.2 slots is that these desktop motherboards have dropped most of their PCIe slots. These days, you seem to commonly get three slots in total on the kind of motherboard that has four M.2 slots. There's always one x16 slot, often two, and sometimes three (although that's physical x16; don't count on getting all 16 PCIe lanes in every slot). It's not uncommon to see the third PCIe slot be physically x4, or a little x1 slot tucked away at the bottom of the motherboard. It also isn't necessarily the case that lower end desktops have more PCIe slots to go with their fewer M.2 slots; they too seem to have mostly gone with two or three PCIe slots, generally with limited number of lanes even if they're physically x16.

(I appreciate having physical x16 slots even if they're only PCIe x1, because that means you can use any card that doesn't require PCIe bifurcation and it should work, although slowly.)

As noted by commentators on my entry on PCIe bifurcation and its uses for NVMe drives, a certain amount of what we used to need PCIe slots for can now be provided through high speed USB-C and similar things. And of course there are only so many PCIe lanes to go around from the CPU and the chipset, so those USB-C ports and other high-speed motherboard devices consume a certain amount of them; the more onboard devices the motherboard has the fewer PCIe lanes there are left for PCIe slots, whether or not you have any use for those onboard devices and connectors.

(Having four M.2 NVMe slots is useful for me because I use my drives in mirrored pairs, so four M.2 slots means I can run my full old pair in parallel with a full new pair, either in a four way mirror or doing some form of migration from one mirrored pair to the other. Three slots is okay, since that lets me add a new drive to a mirrored pair for gradual migration to a new pair of drives.)

tech/MotherboardNVMeMultiSlotsII written at 23:27:41;

2024-12-05

Buffered IO in Unix before V7 introduced stdio

I recently read Julia Evans' Why pipes sometimes get "stuck": buffering. Part of the reason is that almost every Unix program does some amount of buffering for what it prints (or writes) to standard output and standard error. For C programs, this buffering is built into the standard library, specifically into stdio, which includes familiar functions like printf(). Stdio is one of the many things that appeared first in Research Unix V7. This might leave you wondering if this sort of IO was buffered in earlier versions of Research Unix and if it was, how it was done.

The very earliest version of Research Unix is V1, and in V1 there is putc.3 (at that point entirely about assembly, since C was yet to come). This set of routines allows you to set up and then use a 'struct' to implement IO buffering for output. There is a similar set of buffered functions for input, in getw.3, and I believe the memory blocks the two sets of functions use are compatible with each other. The V1 manual pages note it as a bug that the buffer wasn't 512 bytes, but also notes that several programs would break if the size was changed; the buffer size will be increased to 512 bytes by V3.

In V2, I believe we still have putc and getw, but we see the first appearance of another approach, in putchr.s. This implements putchar(), which is used by printf() and which (from later evidence) uses an internal buffer (under some circumstances) that has to be explicitly flush()'d by programs. In V3, there's manual pages for putc.3 and getc.3 that are very similar to the V1 versions, which is why I expect these were there in V2 as well. In V4, we have manual pages for both putc.3 (plus getc.3) and putch[a]r.3, and there is also a getch[a]r.3 that's the input version of putchar(). Since we have a V4 manual page for putchar(), we can finally see the somewhat tangled way it works, rather than having to read the PDP-11 assembly. I don't have links to V5 manuals, but the V5 library source says that we still have both approaches to buffered IO.

(If you want to see how the putchar() approach was used, you can look at, for example, the V6 grep.c, which starts out with the 'fout = dup(1);' that the manual page suggests for buffered putchar() usage, and then periodically calls flush().)

In V6, there is a third approach that was added, in /usr/source/iolib, although I don't know if any programs used it. Iolib has a global array of structs, that were statically associated with a limited number of low-numbered file descriptors; an iolib function such as cflush() would be passed a file descriptor and use that to look up the corresponding struct. One innovation iolib implicitly adds is that its copen() effectively 'allocates' the struct for you, in contrast to putc() and getc(), where you supply the memory area and fopen()/fcreate() merely initialize it with the correct information.

Finally V7 introduces stdio and sorts all of this out, at the cost of some code changes. There's still getc() and putc(), but now they take a FILE *, instead of their own structure, and you get the FILE * from things like fopen() instead of supplying it yourself and having a stdio function initialize it. Putchar() (and getchar()) still exist but are now redone to work with stdio buffering instead of their own buffering, and 'flush()' has become fflush() and takes an explicit FILE * argument instead of implicitly flushing putchar()'s buffer, and generally it's not necessary any more. The V7 grep.c still uses printf(), but now it doesn't explicitly flush anything by calling fflush(); it just trusts in stdio.

unix/BufferedIOBeforeStdio written at 23:16:54;

2024-12-04

Sorting out 'PCIe bifurcation' and how it interacts with NVMe drives

Suppose, not hypothetically, that you're switching from one mirrored set of M.2 NVMe drives to another mirrored set of M.2 NVMe drives, and so would like to have three or four NVMe drives in your desktop at the same time. Sadly, you already have one of your two NVMe drives on a PCIe card, so you'd like to get a single PCIe card that handles two or more NVMe drives. If you look around today, you'll find two sorts of cards for this; ones that are very expensive, and ones that are relatively inexpensive but require that your system supports a feature that is generally called PCIe bifurcation.

NVMe drives are PCIe devices, so a PCIe card that supports a single NVMe drive is a simple, more or less passive thing that wires four PCIe lanes and some other stuff through to the M.2 slot. I believe that in theory, a card could be built that only required x2 or even x1 PCIe lanes, but in practice I think all such single drive cards are physically PCIe x4 and so require a physical x4 or better PCIe slot, even if you'd be willing to (temporarily) run the drive much slower.

A PCIe card that supports more than one M.2 NVMe drive has two options. The expensive option is to put a PCIe bridge on the card, with the bridge (probably) providing a full set of PCIe lanes to the M.2 NVMe drives locally on one side and doing x4, x8, or x16 PCIe with the motherboard on the other. In theory, such a card will work even at x4 or x2 PCIe lanes, because PCIe cards are supposed to do that if the system says 'actually you only get this many lanes' (although obviously you can't drive four x4 NVMe drives at full speed through a single x4 or x2 PCIe connection).

The cheap option is to require that the system be able to split a single PCIe slot into multiple independent groups of PCIe lanes (I believe these are usually called links); this is PCIe bifurcation. In PCIe bifurcation, the system takes what is physically and PCIe-wise an x16 slot (for example) and splits it into four separate x4 links (I've seen this sometimes labeled as 'x4/x4/x4/x4'). This is cheap for the card because it can basically be four single M.2 NVMe PCIe cards jammed together, with each set of x4 lanes wired through to a single M.2 NVMe slot. A PCIe card for two M.2 NVMe drives will require an x8 PCIe slot bifurcated to two x4 links; if you stick this card in an x16 slot, the upper 8 PCIe lanes just get ignored (which means that you can still set your BIOS to x4/x4/x4/x4).

As covered in, for example, this Synopsys page, PCIe bifurcation isn't something that's negotiated as part of bringing up PCIe connections; a PCIe device can't ask for bifurcation and can't be asked whether or not it supports it. Instead, the decision is made as part of configuring the PCIe root device or bridge, which in practice means it's a firmware ('BIOS') decision. However, I believe that bifurcation may also requires hardware support in the 'chipset' and perhaps the physical motherboard.

I put chipset into quotes because for quite some time, some PCIe lanes come directly from the CPU and only some others come through the chipset as such. For example, in desktop motherboards, the x16 GPU slot is almost always driven directly by CPU PCIe lanes, so it's up to the CPU to have support (or not have support) for PCIe bifurcation of that slot. I don't know if common desktop chipsets support bifurcation on the chipset PCIe slots and PCIe lanes, and of course you need chipset-driven PCIe slots that have enough lanes to be bifurcated in the first place. If the PCIe slots driven by the chipset are a mix of x4 and x1 slots, there's no really useful bifurcation that can be done (at least for NVMe drives).

If you have a limited number of PCIe slots that can actually support x16 or x8 and you need a GPU card, you may not be able to use PCIe bifurcation in practice even if it's available for your system. If you have only one PCIe slot your GPU card can go in and it's the only slot that supports bifurcation, you're stuck; you can't have both a bifurcated set of NVMe drives and a GPU (at least not without a bifurcated PCIe riser card that you can use).

(This is where I would start exploring USB NVMe drive enclosures, although on old desktops you'll probably need one that doesn't require USB-C, and I don't know if a NVMe drive set up in a USB enclosure can later be smoothly moved to a direct M.2 connection without partitioning-related problems or other issues.)

(This is one of the entries I write to get this straight in my head.)

Sidebar: Generic PCIe riser cards and other weird things

The traditional 'riser card' I'm used to is a special proprietary server 'card' (ie, a chunk of PCB with connectors and other bits) that plugs into a likely custom server motherboard connector and makes a right angle turn that lets it provide one or two horizontal PCIe slots (often half-height ones) in a 1U or 2U server case, which aren't tall enough to handle PCIe cards vertically. However, the existence of PCIe bifurcation opens up an exciting world of general, generic PCIe riser cards that bifurcate a single x16 GPU slot to, say, two x8 PCIe slots. These will work (in some sense) in any x16 PCIe slot that supports bifurcation, and of course you don't have to restrict yourself to x16 slots. I believe there are also PCIe riser cards that bifurcate an x8 slot into two x4 slots.

Now, you are perhaps thinking that such a riser card puts those bifurcated PCIe slots at right angles to the slots in your case, and probably leaves any cards inserted into them with at least their tops unsupported. If you have light PCIe cards, maybe this works out. If you don't have light PCIe cards, one option is another terrifying thing, a PCIe ribbon cable with a little PCB that is just a PCIe slot on one end (the other end plugs into your real PCIe slot, such as one of the slots on the riser card). Sometimes these are even called 'riser card extenders' (or perhaps those are a sub-type of the general PCIe extender ribbon cables).

Another PCIe adapter device you can get is an x1 to x16 slot extension adapter, which plugs into an x1 slot on your motherboard and has an x16 slot (with only one PCIe lane wired through, of course). This is less crazy than it sounds; you might only have an x1 slot available, want to plug in a x4, x8, or x16 card that's short enough, and be willing to settle for x1 speeds. In theory PCIe cards are supposed to still work when their lanes are choked down this way.

tech/PCIeBifurcationAndNVMe written at 22:01:40;

2024-12-03

The modern world of server serial ports, BMCs, and IPMI Serial over LAN

Once upon a time, life was relatively simple in the x86 world. Most x86 compatible PCs theoretically had one or two UARTs, which were called COM1 and COM2 by MS-DOS and Windows, ttyS0 and ttyS1 by Linux, 'ttyu0' and 'ttyu1' by FreeBSD, and so on, based on standard x86 IO port addresses for them. Servers had a physical serial port on the back and wired the connector to COM1 (some servers might have two connectors). Then life became more complicated when servers implemented BMCs (Baseboard management controllers) and the IPMI specification added Serial over LAN, to let you talk to your server through what the server believed was a serial port but was actually a connection through the BMC, coming over your management network.

Early BMCs could take very brute force approaches to making this work. The circa 2008 era Sunfire X2200s we used in our first ZFS fileservers wired the motherboard serial port to the BMC and connected the BMC to the physical serial port on the back of the server. When you talked to the serial port after the machine powered on, you were actually talking to the BMC; to get to the server serial port, you had to log in to the BMC and do an arcane sequence to 'connect' to the server serial port. The BMC didn't save or buffer up server serial output from before you connected; such output was just lost.

(Given our long standing console server, we had feelings about having to manually do things to get the real server serial console to show up so we could start logging kernel console output.)

Modern servers and their BMCs are quite intertwined, so I suspect that often both server serial ports are basically implemented by the BMC (cf), or at least are wired to it. The BMC passes one serial port through to the physical connector (if your server has one) and handles the other itself to implement Serial over LAN. There are variants on this design possible; for example, we have one set of Supermicro hardware with no external physical serial connector, just one serial header on the motherboard and a BMC Serial over LAN port. To be unhelpful, the motherboard serial header is ttyS0 and the BMC SOL port is ttyS1.

When the BMC handles both server serial ports and passes one of them through to the physical serial port, it can decide which one to pass through and which one to use as the Serial over LAN port. Being able to change this in the BMC is convenient if you want to have a common server operating system configuration but use a physical serial port on some machines and use Serial over LAN on others. With the BMC switching which server serial port comes out on the external serial connector, you can tell all of the server OS installs to use 'ttyS0' as their serial console, then connect ttyS0 to either Serial over LAN or the physical serial port as you need.

Some BMCs (I'm looking at you, Dell) go to an extra level of indirection. In these, the BMC has an idea of 'serial device 1' and 'serial device 2', with you controlling which of the server's ttyS0 and ttyS1 maps to which 'serial device', and then it has a separate setting for which 'serial device' is mapped to the physical serial connector on the back. This helpfully requires you to look at two separate settings to know if your ttyS0 will be appearing on the physical connector or as a Serial over LAN console (and gives you two settings that can be wrong).

In theory a BMC could share a single server serial port between the physical serial connector and an IPMI Serial over LAN connection, sending output to both and accepting input from each. In practice I don't think most BMCs do this and there are obvious issues of two people interfering with each other that BMCs may not want to get involved in.

PS: I expect more and more servers to drop external serial ports over time, retaining at most an internal serial header on the motherboard. That might simplify BMC and BIOS settings.

sysadmin/ServerSerialPortsAndBMCs written at 23:30:36;

2024-12-02

Good union types in Go would probably need types without a zero value

One of the classical big reason to want union types in Go is so that one can implement the general pattern of an option type, in order to force people to deal explicitly with null values. Except this is not quite true on both sides. The compiler can enforce null value checks before use already, and union and option types by themselves don't fully protect you against null values. Much like people ignore error returns (and the Go compiler allows this), people can skip over that they can't extract an underlying value from their Result value and return a zero value from their 'get a result' function.

My view is that the power of option types is what they do in the rest of the language, but they can only do this if you can express their guarantees in the type system. The important thing you need for this is non-nullable types. This is what lets you guarantee that something is a proper value extracted from an error-free Result or whatever. If you can't express this in your types, everyone has to check, one way or another, or you risk a null sneaking in.

Go doesn't currently have a type concept for 'something that can't be null', or for that matter a concept that is exactly 'null'. The closest Go equivalent is the general idea of zero values, of which nil pointers (and nil interfaces) are a special case (but you can also have zero value maps and channels, which also have special semantics; the zero value of slices is more normal). If you want to make Result and similar types particularly useful in Go, I believe that you need to change this, somehow introducing types that don't have a zero value.

(Such types would likely be a variation of existing types with zero values, and presumably you could only use values or assign to variables of that type if the compiler could prove that what you were using or assigning wasn't a zero value.)

As noted in a comment by loreb on my entry on how union types would be complicated, these 'union' or 'sum' types in Go also run into issues with their zero value, and as Ian Lance Taylor's issue comment says, zero values are built quite deeply into Go. You can define semantics for union types that allow zero values, but I don't think they're really particularly useful for anything except cramming some data structures into a few less bytes in a somewhat opaque way, and I'm not sure that's something Go should be caring about.

Given that zero values are a deep part of Go and the Go developers don't seem particularly interested in trying to change this, I doubt that we're ever going to get the powerful form of union types in Go. If anything like union types appears, it will probably be merely to save memory, and even then union types are complicated in Go's runtime.

Sidebar: the simple zero value allowed union type semantics

If you allow union types to have a zero value, the obvious meaning of a zero value is something that can't have a value of any type successfully extracted from it. If you try the union type equivalent of a type assertion you get a zero value and 'false' for all possible options. Of course this completely gives up on the 'no zero value' type side of things, but at least you have a meaning.

This makes a zero value union very similar to a nil interface, which will also fail all type assertions. At this point my feeling is that Go might as well stick with interfaces and not attempt to provide union types.

programming/GoUnionTypesAndZeroValues written at 23:00:36;

2024-12-01

Union types ('enum types') would be complicated in Go

Every so often, people wish that Go had enough features to build some equivalent of Rust's Result type or Option type, often so that Go programmers could have more ergonomic error handling. One core requirement for this is what Rust calls an Enum and what is broadly known as a Union type. Unfortunately, doing a real enum or union type in Go is not particularly simple, and it definitely requires significant support by the Go compiler and the runtime.

At one level we easily do something that looks like a Result type in Go, especially now that we have generics. You make a generic struct that has private fields for an error, a value of type T, and a flag that says which is valid, and then give it some methods to set and get values and ask it which it currently contains. If you ask for a sort of value that's not valid, it panics. However, this struct necessarily has space for three fields, where the Rust enums (and generally union types) act more like C unions, only needing space for the largest type possible in them and sometimes a marker of what type is in the union right now.

(The Rust compiler plays all sorts of clever tricks to elide the enum marker if it can store this information in some other way.)

To understand why we need deep compiler and runtime support, let's ask why we can't implement such a union type today using Go's unsafe package to perform suitable manipulation of a suitable memory region. Because it will make the discussion easier, let's say that we're on a 64-bit platform and our made up Result type will contain either an error (which is an interface value) or an int64[2] array. On a 64-bit platform, both of these types occupy 16 bytes, since an interface value is two pointers in a trenchcoat, so it looks like we should be able to use the same suitably-aligned 16-byte memory area for each of them.

However, now imagine that Go is performing garbage collection. How does the Go runtime know whether or not our 16-byte memory area contains two live pointers, which it must follow as part of garbage collection, or two 64-bit integers, which it definitely cannot treat as pointers and follow? If we've implemented our Result type outside of the compiler and runtime, the answer is that garbage collection has no idea which it currently is. In the Go garbage collector, it's not values that have types, but storage locations, and Go doesn't provide an API for changing the type of a storage location.

(Internally the runtime can set and change information about what pieces of memory contain pointers, but this is not exposed to the outside world; it's part of the deep integration of runtime memory allocation and the runtime garbage collector.)

In Go, without support from the runtime and the compiler the best you can do is store an interface value or perhaps an unsafe.Pointer to the actual value involved. However, this probably forces a separate heap allocation for the value, which is less efficient in several ways that the compiler supported version that Rust has. On the positive side, if you store an interface value you don't need to have any marker for what's stored in your Result type, since you can always extract that from the interface with suitable type assertion.

The corollary to all of this is that adding union types to Go as a language feature wouldn't be merely a modest change in the compiler. It would also require a bunch of work in how such types interact with garbage collection, Go's memory allocation systems (which in the normal Go toolchain allocate things with pointers into separate memory arenas than things without them), and likely other places in the runtime.

(I suspect that Go is pretty unlikely to add union types given this, since you can have much of the API that union types present with interface types and generics. And in my view, union types like Result wouldn't be really useful without other changes to Go's type system, although that's another entry.)

PS: Something like this has come up before in generic type sets.

programming/GoUnionTypesComplexities written at 23:31:34;

(Previous 10 or go back to November 2024 at 2024/11/30)

Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.