2025-03-20
Go's choice of multiple return values was the simpler option
Yesterday I wrote about Go's use of multiple return values and Go types, in reaction to Mond's Were multiple return values Go's biggest mistake?. One of the things that I forgot to mention in that entry is that I think Go's choice to have multiple values for function returns and a few other things was the simpler and more conservative approach in its overall language design.
In a statically typed language that expects to routinely use multiple return values, as Go was designed to with the 'result, error' pattern, returning multiple values as a typed tuple means that tuple-based types are pervasive. This creates pressures on both the language design and the API of the standard library, especially if you start out (as Go did) being a fairly strongly nominally typed language, where different names for the same concrete type can't be casually interchanged. Or to put it another way, having a frequently used tuple container (meta-)type significantly interacts with and affects the rest of the language.
(For example, if Go had handled multiple values through tuples as explicit typed entities, it might have had to start out with something like type aliases (added only in Go 1.9) and it might have been pushed toward some degree of structural typing, because that probably makes it easier to interact with all of the return value tuples flying around.)
Having multiple values as a special case for function returns,
range
, and so on doesn't create anywhere near this additional
influence and pressure on the rest of the language. There are a
whole bunch of questions and issues you don't face because multiple
values aren't types and can't be stored or manipulated as single
entities. Of course you have to be careful in the language specification
and it's not trivial, but it's simpler and more contained than going
the tuple type route. I also feel it's the more conservative approach,
since it doesn't affect the rest of the language as much as a widely
used tuple container type would.
(As Mond criticizes, it does create special cases. But Go is a pragmatic language that's willing to live with special cases.)
2025-03-19
Go's multiple return values and (Go) types
Recently I read Were multiple return values Go's biggest mistake? (via), which wishes that Go had full blown tuple types (to put my spin on it). One of the things that struck me about Go's situation when I read the article is exactly the inverse of what the article is complaining about, which is that because Go allows multiple values for function return types (and in a few other places), it doesn't have to have tuple types.
One problem with tuple types in a statically typed language is that they must exist as types, whether declared explicitly or implicitly. In a language like Go, where type definitions create new distinct types even if the structure is the same, it isn't particularly difficult to wind up with an ergonomics problem. Suppose that you want to return a tuple that is a net.Conn and an error, a common pair of return values in the net package today. If that tuple is given a named type, everyone must use that type in various places; merely returning or storing an implicitly declared type that's structurally the same is not acceptable under Go's current type rules. Conversely, if that tuple is not given a type name in the net package, everyone is forced to stick to an anonymous tuple type. In addition, this up front choice is now an API; it's not API compatible to give your previously anonymous tuple type a name or vice versa, even if the types are structurally compatible.
(Since returning something and error is so common an idiom in Go, we're also looking at either a lot of anonymous types or a lot more named types. Consider how many different combinations of multiple return values you find in the net package alone.)
One advantage of multiple return values (and the other forms of tuple assignment, and for range clauses) is that they don't require actual formal types. Functions have a 'result type', which doesn't exist as an actual type, but you also needed to handle the same sort of 'not an actual type' thing for their 'parameter type'. My guess is that this let Go's designers skip a certain amount of complexity in Go's type system, because they didn't have to define an actual tuple (meta-)type or alternately expand how structs worked to cover the tuple usage case,
(Looked at from the right angle, structs are tuples with named fields, although then you get into questions of nested structs act in tuple-like contexts.)
A dynamically typed language like Python doesn't have this problem because there are no explicit types, so there's no need to have different types for different combinations of (return) values. There's simply a general tuple container type that can be any shape you want or need, and can be created and destructured on demand.
(I assume that some statically typed languages have worked out how to handle tuples as a data type within their type system. Rust has tuples, for example; I haven't looked into how they work in Rust's type system, for reasons.)
2025-03-17
I don't think error handling is a solved problem in language design
There are certain things about programming language design that are more or less solved problems, where we generally know what the good and bad approaches are. For example, over time we've wound up agreeing on various common control structures like for and while loops, if statements, and multi-option switch/case/etc statements. The syntax may vary (sometimes very much, as for example in Lisp), but the approach is more or less the same because we've come up with good approaches.
I don't believe this is the case with handling errors. One way to see this is to look at the wide variety of approaches and patterns that languages today take to error handling. There is at least 'errors as exceptions' (for example, Python), 'errors as values' (Go and C), and 'errors instead of results and you have to check' combined with 'if errors happen, panic' (both Rust). Even in Rust there are multiple idioms for dealing with errors; some Rust code will explicitly check its Result types, while other Rust code sprinkles '?' around and accepts that if the program sails off the happy path, it simply dies.
If you were creating a new programming language from scratch, there's no clear agreed answer to what error handling approach you should pick, not the way we have more or less agreed on how for, while, and so on should work. You'd be left to evaluate trade offs in language design and language ergonomics and to make (and justify) your choices, and there probably would always be people who think you should have chosen differently. The same is true of changing or evolving existing languages, where there's no generally agreed on 'good error handling' to move toward.
(The obvious corollary of this is that there's no generally agreed on keywords or other syntax for error handling, the way 'for' and 'while' are widely accepted as keywords as well as concepts. The closest we've come is that some forms of error handling have generally accepted keywords, such as try/catch for exception handling.)
I like to think that this will change at some point in the future. Surely there actually is a good pattern for error handling out there and at some point we will find it (if it hasn't already been found) and then converge on it, as we've converged on programming language things before. But I feel it's clear that we're not there yet today.
2025-03-02
Updating local commits with more changes in Git (the harder way)
One of the things I do with Git is maintain personal changes locally on top of the upstream version, with my changes updated via rebasing every time I pull upstream to update it. In the simple case, I have only a single local change and commit, but in more complex cases I split my changes into multiple local commits; my local version of Firefox currently carries 12 separate personal commits. Every so often, upstream changes something that causes one of those personal changes to need an update, without actually breaking the rebase of that change. When this happens I need to update my local commit with more changes, and often it's not the 'top' local commit (which can be updated simply).
In theory, the third party tool git-absorb should be ideal for this, and I believe I've used it successfully for this purpose in the past. In my most recent instance, though, git-absorb frustratingly refused to do anything in a situation where it felt it should work fine. I had an additional change to a file that was changed in exactly one of my local commits, which feels like an easy case.
(Reading the git-absorb readme carefully suggests that I may be running into a situation where my new change doesn't clash with any existing change. This makes git-absorb more limited than I'd like, but so it goes.)
In Git, what I want is called a 'fixup commit', and how to use it is covered in this Stackoverflow answer. The sequence of commands is basically:
# modify some/file with new changes, then git add some/file # Use this to find your existing commit ID git log some/file # with the existing commid ID git commit --fixup=<commit ID> git rebase --interactive --autosquash <commit ID>^
This will open an editor buffer with what 'git rebase' is about to do, which I can immediately exit out of because the defaults are exactly what I want (assuming I don't want to shuffle around the order of my local commits, which I probably don't, especially as part of a fixup).
I can probably also use 'origin/main' instead of '<commit ID>^', but that will rebase more things than is strictly necessary. And I need the commit ID for the 'git commit --fixup' invocation anyway.
(Sufficiently experienced Git people can probably put together a script that would do this automatically. It would get all of the files staged in the index, find the most recent commit that modified each of them, abort if they're not all the same commit, make a fixup commit to that most recent commit, and then potentially run the 'git rebase' for you.)
2025-02-24
Go's behavior for zero value channels and maps is partly a choice
How Go behaves if you have a zero value channel or map (a 'nil' channel or map) is somewhat confusing (cf, via). When we talk about it, it's worth remembering that this behavior is a somewhat arbitrary choice on Go's part, not a fundamental set of requirements that stems from, for example, other language semantics. Go has reasons to have channels and maps behave as they do, but some those reasons have to do with how channel and map values are implemented and some are about what's convenient for programming.
As hinted at by how their zero value is called a 'nil' value, channel
and map values are both implemented as pointers to runtime data
structures. A nil channel or map has no such runtime data structure
allocated for it (and the pointer value is nil); these structures
are allocated by make()
. However,
this doesn't entirely allow us to predict what happens when you use
nil values of either type. It's not unreasonable for an attempt to
assign an element to a nil map to panic, since the nil map has no
runtime data structure allocated to hold anything we try to put in
it. But you don't have to say that a nil map is empty and looking
up elements in it gives you a zero value; I think you could have
this panic instead, just as assigning an element does. However,
this would probably result in less safe code that paniced more
(and probably had more checks for nil maps, too).
Then there's nil channels, which don't behave like nil maps. It would make sense for receiving from a nil channel to yield the zero value, much like looking up an element in a nil map, and for sending to a nil channel to panic, again like assigning to an element in a nil map (although in the channel case it would be because there's no runtime data structure where your goroutine could metaphorically hang its hat waiting for a receiver). Instead Go chooses to make both operations (permanently) block your goroutine, with panicing on send reserved for sending to a non-nil but closed channel.
The current semantics of sending on a closed channel combined with select statements (and to a lesser extent receiving from a closed channel) means that Go needs a channel zero value that is never ready to send or receive. However, I believe that Go could readily make actual sends or receives on nil channels panic without any language problems. As a practical matter, sending or receiving on a nil channel is a bug that will leak your goroutine even if your program doesn't deadlock.
Similarly, Go could choose to allocate an empty map runtime data
structure for zero value maps, and then let you assign to elements
in the resulting map rather than panicing. If desired, I think you
could preserve a distinction between empty maps and nil maps. There
would be some drawbacks to this that cut against Go's general
philosophy of being relatively explicit about (heap) allocations
and you'd want a clever compiler that didn't bother creating those
zero value runtime map data structures when they'd just be overwritten
by 'make()
' or a return value from a function call or the like.
(I can certainly imagine a quite Go like language where maps don't
have to be explicitly set up any more than slices do, although you
might still use 'make()
' if you wanted to provide size hints to
the runtime.)
Sidebar: why you need something like nil channels
We all know that sometimes you want to stop sending or receiving
on a channel in a select statement. On first impression
it looks like closing a channel (instead of setting the channel to
nil) could be made to work for this (it doesn't currently). The
problem is that closing a channel is a global thing, while you may
only want a local effect; you want to remove the channel from your
select
, but not close down other uses of it by other goroutines.
This need for a local effect pretty much requires a special, distinct channel value that is never ready for sending or receiving, so you can overwrite the old channel value with this special value, which we might as well call a 'nil channel'. Without a channel value that serves this purpose you'd have to complicate select statements with some other way to disable specific channels.
(I had to work this out in my head as part of writing this entry so I might as well write it down for my future self.)
2025-02-02
Build systems and their effects on versioning and API changes
In a comment on my entry on modern languages and bad packaging outcomes at scale, sapphirepaw said (about backward and forward compatibility within language ecologies), well, I'm going to quote from it because it's good (but go read the whole comment):
I think there’s a social contract that has broken down somewhere.
[...]
If a library version did break things, it was generally considered a bug, and developers assumed it would be fixed in short order. Then, for the most part, only distributions had to worry about specific package/library-version incompatibilities.
This all falls apart if a developer, or the ecosystem of libraries/language they depend on, ends up discarding that compatibility-across-time. That was the part that made it feasible to build a distribution from a collection of projects that were, themselves, released across time.
I have a somewhat different view. I think that the way it was in the old days was less a social contract and more an effect of the environment that software was released into and built in, and now that the environment has changed, the effects have too.
C famously has a terrible story around its (lack of a) build system and dependency management, and for much of its life you couldn't assume pervasive and inexpensive Internet connectivity (well, you still can't assume the latter globally, but people have stopped caring about such places). This gave authors of open source software a strong incentive to be both backward and forward compatible. If you released a program that required the features of a very recent version of a library, you reduced your audience to people who already had the recent version (or better) or who were willing to go through the significant manual effort to get and build that version of the library, and then perhaps make all of their other programs work with it, since C environments often more or less forced global installation of libraries. If you were a library author releasing a new minor version or patch level that had incompatibilities, people would be very slow to actually install and adopt that version because of those incompatibilities; most of their programs using your libraries wouldn't update on the spot, and there was no good mechanism to use the old version of the library for some programs.
(Technically you could make this work with static linking, but static linking was out of favour for a long time.)
All of this creates a quite strong practical and social push toward stability. If you wanted your program or its new version to be used widely (and you usually did), it had better work with the old versions of libraries that people already had; requiring new APIs or new library behavior was dangerous. If you wanted the new version of your library to be used widely, it had better be compatible with old programs using the old API, and if you wanted a brand new library to be used by people in programs, it had better demonstrate that it was going to be stable.
Much of this spilled over into other languages like Perl and Python. Although both of these developed central package repositories and dependency management schemes, for a long time these mostly worked globally, just like the C library and header ecology, and so they faced similar pressures. Python only added fully supported virtual environments in 2012, for example (in Python 3.3).
Modern languages like Go and Rust (and the Node.js/NPM ecosystem, and modern Python venv based operation) don't work like that. Modern languages mostly use static linking instead of shared libraries (or the equivalent of static linking for dynamic languages, such as Python venvs), and they have build systems that explicitly support automatically fetching and using specific versions of dependencies (or version ranges; most build systems are optimistic about forward compatibility). This has created an ecology where it's much easier to use a recent version of something than it was in C, and where API changes in dependencies often have much less effect because it's much easier (and sometimes even the default) to build old programs with old dependency versions.
(In some languages this has resulted in a lot of programs and packages implicitly requiring relatively recent versions of their dependencies, even if they don't say so and claim wide backward compatibility. This happens because people would have to take explicit steps to test with their stated minimum version requirements and often people don't, with predictable results. Go is an exception here because of its choice of 'minimum version selection' for dependencies over 'maximum version selection', but even then it's easy to drift into using new language features or new standard library APIs without specifically requiring that version of Go.)
One of the things about technology is that technology absolutely affects social issues, so different technology creates different social expectations. I think that's what's happened with social expectations around modern languages. Because they have standard build systems that make it easy to do it, people feel free to have their programs require specific version ranges of dependencies (modern as well as old), and package authors feel free to break things and then maybe fix them later, because programs can opt in or not and aren't stuck with the package's choices for a particular version. There are still forces pushing towards compatibility, but they're weaker than they used to be and more often violated.
Or to put it another way, there was a social contract of sorts for C libraries in the old days but the social contract was a consequence of the restrictions of the technology. When the technology changed, the 'social contract' also changed, with unfortunate effects at scale, which most developers don't care about (most developers aren't operating at scale, they're scratching their own itch). The new technology and the new social expectations are probably better for the developers of programs, who can now easily use new features of dependencies (or alternately not have to update their code to the latest upstream whims), and for the developers of libraries and packages, who can change things more easily and who generally see their new work being used faster than before.
(In one perspective, the entire 'semantic versioning' movement is a reaction to developers not following the expected compatibility that semver people want. If developers were already doing semver, there would be no need for a movement for it; the semver movement exists precisely because people weren't. We didn't have a 'semver' movement for C libraries in the 1990s because no one needed to ask for it, it simply happened.)
2025-01-19
Sometimes print-based debugging is your only choice
Recently I had to investigate a mysterious issue in our Django based Python web application. This issue happened only when the application was actually running as part of the web server (using mod_wsgi, which effectively runs as an Apache process). The only particularly feasible way to dig into what was going on was everyone's stand-by, print based debugging (because I could print into Apache's error log; I could have used any form of logging that would surface the information). Even if I might have somehow been able to attach a debugger to things to debug a HTTP request in flight, using print based debugging was a lot easier and faster in practice.
I'm a long time fan of print based debugging. Sometimes this is because print based debugging is easier if you only dip into a language every so often, but that points to a deeper issue, which is that almost every environment can print or log. Print or log based 'debugging' is an almost universal way to extract information from a system, and sometimes you have no other practical way to do that.
(The low level programming people sometimes can't even print things out, but there are other very basic ways to communicate things.)
As in my example, one of the general cases where you have very little access other than logs is when your issue only shows up in some sort of isolated or encapsulated environment (a 'production' environment). We have a lot of ways of isolating things these days, things like daemon processes, containers, 'cattle' (virtual) servers, and so on, but they all share the common trait that they deliberately detach themselves away from you. There are good reasons for this (which often can be boiled down to wanting to run in a controlled and repeatable environment), but it has its downsides.
Should print based debugging be the first thing you reach for? Maybe not; some sorts of bugs cause me to reach for a debugger, and in general if you're a regular user of your chosen debugger you can probably get a lot of information with it quite easily, easier than sprinkling print statements all over. But I think that you probably should build up some print debugging capabilities, because sooner or later you'll probably need them.
2025-01-09
Realizing why Go reflection restricts what struct fields can be modified
Recently I read Rust, reflection and access rules. Among
other things, it describes how a hypothetical Rust reflection system
couldn't safely allow access to private fields of things, and
especially how it couldn't allow code to set them through reflection.
My short paraphrase of the article's discussion is that in Rust,
private fields can be in use as part of invariants that allow unsafe
operations to be done safely through suitable public APIs. This
brought into clarity what had previously been a somewhat odd seeming
restriction in Go's reflect
package.
Famously (for people who've dabbled in reflect
), you can only set
exported struct fields. This is covered in both the Value.CanSet()
package documentation and
The Laws of Reflection (in
passing). Since one of the uses of reflection is for going between JSON
and structs, encoding/json only
works on exported struct fields and you'll find a lot of such fields in
lots of code. This requirement can be a bit annoying. Wouldn't it be
nice if you didn't have to make your fields public just to serialize
them easily?
(You can use encoding/json and still serialize non-exported struct fields, but you have to write some custom methods instead of just marking struct fields the way you could if they were exported.)
Go has this reflect
restriction, presumably, for the same
reason that reflection in Rust wouldn't be able to modify private
fields. Since private fields in a Go struct may be used by functions
and methods in the package to properly manage the struct, modifying
those fields yourself is unsafe (in the general sense). The
reflect
package will let you see the fields (and their values)
but not change their values. You're allowed to change exported
fields because (in theory) arbitrary Go code can already change the
value of those fields, and so code in the struct's package can't
count on them having any particular value. It can at least sort of
count on private fields having approved values (or the zero value,
I believe).
(I understand why the reflect
documentation doesn't explain
the logic of not being able to modify private fields, since package
documentation isn't necessarily the right place for a rationale.
Also, perhaps it was considered obvious.)
2024-12-16
Some notes on "closed interfaces" in Go
One reaction to basic proposals for union types in Go is to note that "closed interfaces" provide a lot of these features (cf). When I saw this I had to refresh myself about what such a closed interface is, and then think about some of the potential issues and limitations involved, leaving me with some things I want to note down for my later reference.
What I've seen called a closed interface is an interface that requires an unexported method:
type Closed interface { isClosed() NormalMethod1(...) ... NormalMethod2(...) ... [...] }
Although it's not spelled out exactly in the Go language specification, an interface with an unexported method
like this can't be implemented by any type outside of its package,
because such an external type can't have the right 'isClosed()
'
method on it (since it's not exported, the identifier for the
method is unique to the package, or at least I
believe that's how the logic flows). The 'isClosed()
' method
doesn't do anything and need never be called, it just has to be
declared as an (empty) method function on everything that is going
to be part of the Closed
interface.
This means that there is a finite and known list of types that
implement the Closed
interface, instead of the potentially unknown
list of them that could exist for an open interface. Go tooling can
use this knowledge to see, for example, that a type switch on
Closed
is exhaustive, or that a given type assertion will never
succeed. However, in the current state of godoc, I don't believe
that generated package documentation will automatically tell you
which types implement Closed
; you'll have to remember to document
that explicitly.
(I don't know if there's any Go tooling that does this today,
especially when run in the context of other packages instead
of the package that defines the Closed
interface.)
Code in other packages can still construct a nil Closed
interface
value, or zero values of any exported types that implement Closed
(and then turn such a zero value into a non-nil Closed
interface
value). If you want to be extra tricky, you can make all the types
that implement Closed
be unexported; at that point the only way
people outside your package can easily create or manipulate those
types is through an instance of Closed
that you give them, and
implicitly only through its methods. The Closed
interface is not
merely a closed interface, it has become an opaque interface
(short of people getting out the chainsaw of reflect
and perhaps unsafe
).
However, this is also a limitation of the Closed
interface, and
of closed interfaces in general. For good reason, you can't add
methods to types declared outside your package, so you can't make
an outside type be a member of the Closed
interface, even though
you control the interface's definition in your package. In order
to induct a useful outside type into the world of Closed
, you
must wrap it in a type from your package, and this type must be a
concrete type, even if what you want to pull in is another interface.
I believe that under at least some circumstances, this will cost
you some extra memory. More broadly, I don't think you can really
have two separate packages that cooperate so each defines some types
that are part of Closed
. One way or another you have to put all
the types in one package.
In my view, this means that a closed interface isn't really useful to document inside your package that this particular interface will only ever contain a limited number of outside types (or a mixture of outside types and inside types), including outside types from a related package. You can use it for this but there's a chunk of bureaucracy involved for each outside type you want to pull in. If you go to this effort, presumably you have tooling that can deduce what's going on and take advantage of this knowledge.
(These days you could define a generic type that wraps another type
and implements your Closed
interface for it, making this sort of
bureaucracy easier, at least.)
2024-12-15
I think Go union type proposals should start with their objectives
At this point I've skimmed a number of relatively serious union type proposals for Go (which is to say, people who were serious enough to write something substantial in the Go issue tracker). One of the feelings I've wound up with as a result of this is that any such union type proposal should probably start out by describing what its objectives are, not what its proposed syntax is.
There are a number of different things union types in Go could do, ranging from creating an interface type that only allows a limited range of concrete types to (potentially) reducing the amount of space needed to store one of several types of values to requiring people to do specific things to extract an interior value from a union value, implicitly forcing them to check for errors (or panic). Some of these objectives will be more or less complete by themselves, but some others will really want (or at least benefit from) additional changes to things like 'go vet', or in some cases be incomplete without other language changes.
Proposals that bury their objectives (or don't clearly spell them out) invite people to guess at what they really want, and then argue about it. Such proposals are also harder to evaluate, since a reader can't judge whether the proposal would have problems achieving its objectives, or even to what extent the objectives are reasonable for Go. Without objectives, people are left to discuss what the proposal does and doesn't achieve without understanding which of those things are important (and need to be kept in any changes) and which aren't (and could be altered or removed).
Presumably the proposer had an objective in mind and so can write it down. If they haven't actually considered their objectives, they should definitely start with that, not with the syntax or language semantics. I won't say that objectives are the most important thing about a change to Go, because the syntax and semantics matter too, but a clearly expressed objective is (in my view) necessary. And if you convince people that the objective is a good one, you have a much better chance of having that objective achieved somehow, even if it happens in a different way than you thought of.
(One reason that people may be reluctant to write down their objective is for fear that others won't like it and will be against the proposal as a result. One of my views is that if you can't persuade people of the objectives, your proposal should fail, and trying to sneak it in is a bad look.)
PS: One of the reasons I'm writing this down is for myself, because I was briefly tempted to write up a straw-person union types idea here on Wandering Thoughts. Had I done so, it definitely wouldn't have started from objectives, but from semantics and some effects. It's easy to have a clever idea (or at least a clever looking idea, one that sounds good for long enough for me to write a blog entry about it), but mere clever ideas are at best a distraction.