Wandering Thoughts archives

2024-01-25

In Go, I'm going to avoid using 'any' as an actual type

As modern Go programmers know, when Go introduced generics it also introduced a new 'any' type. This is officially documented as:

For convenience, the predeclared type any is an alias for the empty interface.

The 'any' type (alias) exists because it's extremely common in code that's specifying generic types to want to be able to say 'any type', and the way this is done in generics is 'interface{}', the empty interface. This makes generic code clearly easier to read and follow. Consider these two versions of the signature of reflect.TypeFor

func TypeFor[T any]() Type
func TypeFor[T interface{}]() Type

These are semantically equivalent but the first is clearer, because you don't have to remember this special case of what 'interface{}' means. Instead, it's right in the name 'any' (and there's less syntactic noise).

But after Go generics became a thing, there's been a trend of using this new 'any' alias outside of generic types, instead of writing out 'interface{}'. I don't think this is a good idea. To show why, consider the following two function signatures, both of which use 'any':

func One[T any](v T) bool
func Two(v any) bool

These two function signatures look almost the same, but they have wildly different meanings, even if (or when) they're invoked with the same argument. The effects of 'One(10)' are rather different from 'Two(10)', since 'One' is a generic function while 'Two' is a regular one. Now consider them written this way:

func One[T any](v T) bool
func Two(v interface{}) bool

Now we see clearly what Two() is doing differently than One(); it's obvious that it isn't taking 'any type' as such, but instead it's taking a generic interface as the argument type. This makes it obvious that a non-interface value will be converted to an interface value (and will tell some people that an interface value will lose its interface type).

This increased immediate clarity without needing to remember what 'any' is why I'm planning to use 'interface{}' in my code in the future, and why I think you should too. Yes, 'any' is shorter and it has a well defined meaning in the specification and we can probably remember the special meaning all of the time. But why give ourselves that extra cognitive burden when we can be explicit?

(In generics, the argument goes the other way; 'any' really does mean 'any type', and the 'any' name is clearer than writing 'interface{}' and then needing to remember that that's how generics do it.)

In a sense the 'any' name is a misnomer when used as a type. It's true that 'interface{}' will accept any type, but used as a type, it doesn't mean 'any type'; it means specifically the type 'an empty interface', which is to say an interface that has no methods, which implies interface type conversion (unless you already have an 'interface{}' value). Since 'any' does mean 'any type' in the context of generics, I think it's better to use a different name for each thing, even if Go formally makes the names equivalent. The names of things are fundamentally for people.

GoAvoidingAnyAsAType written at 23:03:50;

2024-01-14

Git branches as a social construct

Over on the Fediverse, I had a half-baked thesis:

A half-baked thesis: branches in Git are a social construct, somewhat enabled by technical features. We talk about things having been done on a branch or existing on a branch, or what branches are what on an intertwined tree of them, even when this is not something you can find in the Git repository.

(This is since commits aren't permanently associated with a branch; they are merely currently reachable from one or more branches. What branch a multi-head-reachable commit is on is up to us.)

The background on this is more or less Julia Evans' git branches: intuition & reality, or more exactly a Fediverse discussion between Julia Evans and Mark Dominus (and Mark Dominus's I wish people would stop insisting that Git branches are nothing but refs).

This ties into my long standing view that modern version control systems are a user interface to their underlying mathematics. Git has an internal, mathematical view of what 'branches' are, but very few people actually use this mathematical view; instead we use a variety of 'user interface' views of what branches are and how we think of them. Git supports these user views of branches with various 'porcelain' features.

Some projects using Git actively work to create branches that have a more concrete and durable existence. For instance, commits on a Go release branch have the branch name in the commit's title, which is something the Go project does for a lot of branches that are used in Go development and release. For development branches specifically, this durably marks commits as having been done on the branch even after the branch is merged to the 'main' development branch.

Certainly, how I normally think of Git branches is different from their technical existence, and it differs from branch to branch. For example, in a typical repository I think of the 'main' branch as running all the way back to the creation of the repository, but other branches as only running back to where they split from 'main', despite this not being technically correct.

(Another sign of Git branches as being a bit socially constructed is how you can rename them (per the comments).)

PS: There are other VCSes where branches have a more durable existence in the VCS history. These VCSes are neither wrong nor right; my view is that they've taken a different view of both the UI and the mathematics of what 'branches' are in their mathematical version of version control.

GitBranchesSocialConstructs written at 21:54:44;

2024-01-04

'Unmaintained' (open source) code represents a huge amount of value

I recently read Aaron Ballman's Musings on the C charter (via). As part of musing on backward compatibility in new versions of the C standard, Ballman wrote:

[...] I would love to see this principle updated to set a time limit, along the lines of: existing code that has been maintained to not use features marked deprecated, obsolescent, or removed in the past ten years is important; unmaintained code and existing implementations are not. If you cannot update your code to stop relying on deprecated functionality, your code is not actually economically important — people spend time and money maintaining things that are economically important.

To put it one way, I disagree strongly with the view that 'unmaintained' code is not valuable or important. In the open source world, I routinely use a significant number of programs and a large amount of code that no longer sees meaningful changes and development. This code may be maintained in the sense that there is someone who will fix security issues and important bugs, and maybe make a few changes here and there, but it is not 'maintained' in the sense that I think Ballman means, where it undergoes enough development that changing away from newly deprecated functionality (in C or any other language) would be lost in the noise.

(This code has 'value' in the sense that there's a community of people who are (still) using the software, often happily and by choice. Often these are relatively small communities, although not always. If there's no community still using the code, then it's mostly unimportant.)

Some of this open source code is genuinely more or less finished; its authors don't have any particular features they want to add to it. Other amounts of this open source code has fallen out of favour, with no one left behind that is interested in active development to move it forward, but potentially with plenty of people who derive value from its current working state. You can probably name projects for each camp.

(An extremely relevant example of the second case is X. People are quite aggressively not doing any further development of X, and they're happy to tell you about it. At the same time, a great deal of things still run on X.)

Making a future version of C (as an example) unable to build those code bases is effectively branching the language, in much the same way (although probably to a lesser extent) than the Python 2 versus Python 3 split. If C compilers fully support the old version of C, everything is probably reasonably fine; even though the language has branched, old projects can continue to use the old language forever. If C compilers start deciding that they want to drop the old version of C because it's been a while, we are not so fine.

PS: This has also come up in the case of Go. Old Go code itself still compiles and works fine, but the Go build environment has changed significantly enough that old code only builds through what is basically a hack in the main Go toolchain. Based on personal experience, I can tell you that there are a number of Go programs out there in the world that have not had even the minimal update to build using Go modules, but which are likely still used.

UnmaintainedCodeHugeValue written at 23:33:19;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.