Wandering Thoughts

2019-06-15

Some notes on Intel's CPUID and how to get it for your CPUs

In things like Intel's MDS security advisory, Intel likes to identify CPU families with what they call a 'CPUID', which is a hex number. For example, the CPUID of the Sandy Bridge Xeon E5 'Server Embedded' product family is listed by Intel as 206D7, the CPUID of the Westmere Xeon E7 family is 206F2, and the CPUID of the Ivy Bridge Xeon E7 v2 family is 306E7. Given that one of these families has a microcode update theoretically available, one of them is supposed to get it sometime, and one will not get a microcode update, it has become very useful to be able to find out the CPUID of your Intel processors (especially given Intel's confusing Xeon names).

On x86 CPUs, this information comes from the CPU via the CPUID instruction, which provides all sorts of information (including the brand name of the processor itself, which the processor directly provides in ASCII). Specifically, it is the 'processor version information' that you get from using CPUID to query the Processor Info and Feature Bits. Many things will tell you this information, for example Linux's /proc/cpuinfo and lscpu, but they decode what it represents to give you the CPU family, model, and stepping (using a complicated algorithm that is covered in that Wikipedia entry on CPUID). Intel's 'CPUID' is it directly in hex, and I don't know if you can reliably reverse a given family/model/stepping triplet into the definite CPUID (I haven't tried to do it).

(Intel's MDS PDF also lists a two-hex-digit 'Platform ID'. I don't know where this comes from or how you find out what yours is. I thought I found some hints, but they don't appear to give the right answer on my test machine.)

There are a variety of ways to get the Intel CPUID in raw hex. The most brute force method and perhaps the simplest is to write a program that uses the CPUID instruction to get this. Keen people can use C with inline assembly, but I used Go with a third party package for this that I found through the obvious godoc.org search:

package main
import (
  "fmt"
  "sigs.k8s.io/node-feature-discovery/pkg/cpuid"
)

func main() {
  r := cpuid.Cpuid(0x01, 0x00)
  fmt.Printf("cpuid: %x\n", r.EAX)
}

This has the great benefit of Go for busy sysadmins; it compiles to a static binary that will run on any machine regardless of what packages you have installed, and you can pretty much cross-compile it for other Unixes if you need to (at least 64-bit x86 Unixes; people with 32-bit x86 Unixes are out of luck here without some code changes, but this package may help).

(Intel also has a CPUID package for Go, but it wants to decode this information instead of just give it to you literally so you can print the hex that Intel uses in its documentation. I wish Intel's left hand would talk to its right hand here.)

On Linux machines, you may have the cpuid program available as a package, and I believe it's also in FreeBSD ports in the sysutils section (and FreeBSD has another 'cpuid' program that I know nothing about). Cpuid normally decodes this information, as everything does, but you can get it to dump the raw information and then read out the one field of one line you care about, which is the 'eax' field in the line that starts with '0x00000001':

; cpuid -1 -r
CPU:
   0x00000000 0x00: eax=0x00000016 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x000906ea ebx=0x04100800 ecx=0x7ffafbff edx=0xbfebfbff
[...]

(This is my home machine, and the eax of 0x000906ea matches the CPUID of 906EA that Intel's MDS PDF says that an i7-8700K should have.)

Perhaps you see why I think a Go program is simpler and easier.

IntelCPUIDNotes written at 23:27:09; Add Comment

2019-06-09

Go recognizes and specially compiles some but not all infinite loops

A while back, for my own reasons, I wrote an 'eatcpu' program to simply eat some number of CPUs worth of CPU time (by default, all of the CPUs on the machine). I wrote it in Go, because I wanted a straightforward program and life is too short to deal with threading in C. The code that uses CPU is simply:

func spinner() {
        var i int
        for {
                i++
        }
}

At the time I described this as a simple integer CPU soaker, since the code endlessly increments an int. Recently, to make my life more convenient, I decided to put it up on Github, and as part of that I decided that I wanted to actually know what it was doing; specifically I wanted to know if it actually was all running in CPU registers or if Go was actually loading and storing from memory all of the time. I did this in the straightforward way of running 'go tool compile -S' (after some research) and then reading the assembly. It took me some time to understand what I was reading and believe in it, because here is the entire assembly that spinner() compiles down to:

0x0000 00000 (eatcpu.go:27)     JMP     0
0x0000 eb fe

(The second line is the actual bytes of object code.)

Go 1.12.5 had recognized that I had an infinite loop with no outside effects and had compiled it down to nothing more than that. Instead of endless integer addition, I had an endless JMP, which was probably using almost none of the CPU's circuitry (certainly it doesn't need to use the integer ALU).

The Go compiler is clever enough to recognize that a variation of this is still an infinite loop:

func spinner2() int {
        var i int
        for {
                i++
        }
        return i
}

This too compiles down to 'JMP 0', since it can never exit the for loop to return anything.

However, the Go compiler does not recognize impossible situations as being infinite loops. For example, we can write the following:

func spinner3() uint {
        var i uint
        for ; i >= 0 ; {
                i++
        }
        return i
}

Since i is an unsigned integer, the for condition is always true and the loop will never exit. However, Go 1.12.5 compiles it to actual arithmetic and looping code, instead of just a 'JMP 0'. The core of the assembly code is:

0x0000  XORL    AX, AX
0x0002  JMP     7
0x0004  INCQ    AX
0x0007  TESTQ   AX, AX
0x000a  JCC     4
0x000c  MOVQ    AX, "".~r0+8(SP)
0x0011  RET

(The odd structure is because of how plain for loops are compiled. The exit check is relocated to the bottom of the loop, and then on initial loop entry, at 0x0002, we skip over the loop body to start by evaluating the exit check.)

If I'm understanding the likely generated x86 assembly correctly, this will trivially never exit; TESTQ likely compiles to some version of TEST, which unconditionally clears CF (the carry flag), and JCC jumps if the carry flag is clear.

(The Go assembler's JCC is apparently x86's JAE, per here, and per this x86 JUMP quick reference, JAE jumps if CF is clear. Since I had to find all of that and follow things through, I'm writing it down.)

On the whole, I think both situations are reasonable. Compiling infinite for loops to straight JMPs is perfectly reasonable, since they do get used in real Go code, and so is eliminating operations that have no side effects; put them together and spinner() turns into 'JMP 0'. On the other hand, the unsigned int comparison in spinner3() should never happen in real, non-buggy code, so it's probably fine for the optimizer to not recognize that it's always true and thus that this creates an infinite loop with no outside effects.

(There is little point to spending effort on optimizing buggy code.)

PS: I don't know if there's already a Go code checker that looks for unsigned-related errors like the comparison in spinner3(), but if there isn't there is probably room for one.

GoInfiniteLoopOptimization written at 21:56:40; Add Comment

2019-06-05

Go channels work best for unidirectional communication, not things with replies

Once, several years ago, I wrote some Go code that needed to manipulate a shared data structure. At this time I had written and read less Go code than I have now, and so I started out by trying to use channels and goroutines for this. There would be one goroutine that directly manipulated the data structure; everyone else would ask it to do things over channels. Very rapidly this failed and I wound up using mutexes.

(The pattern I tried is what I have since seen called a monitor goroutine (via).)

Since then, I have come to feel that this is one regrettable weakness of Go channels. However nice, useful, and convenient they are for certain sorts of communication patterns, Go channels do not give you very good ways of implementing a 'RPC' communication pattern, where you make a request of another goroutine and expect to get an answer back, since there is no direct way to reply to a channel message. In order to be able to reply to the sender, your monitor goroutine must receive a unique reply channel as part of the incoming request, and then things can start getting much more complicated and tangled from there (with various interesting failure modes if anyone ever makes a programming mistake; for example, you really want to insist that all reply channels are buffered).

My current view is that Go channels work best for unidirectional communication, where either you don't need an answer to the message you've sent or it doesn't matter which goroutine in particular receives and processes the 'reply' (really the next step), so you can use a single shared channel that everyone pulls messages from. Implementing some sort of bidirectional communication between specific goroutines with channels is generally going to be painful and require a bunch of bureaucracy that will complicate your code (unless all of the goroutines are long-lived and have communication patterns that can be set up once and then left alone). This makes the "monitor goroutine" pattern a bad idea simply for code clarity reasons, never mind anything else like performance or memory churn.

(This is especially the case if you have a bunch of different requests to send to the one goroutine, each of which can get a different reply, because then you need a bunch of different channel types unless you're going to smash everything together in various less and less type-safe ways. The more methods you would implement on your shared data structure, the more painful doing everything through a monitor goroutine will be.)

I'm not sure there's anything that Go could do to change this, and it's not clear to me that Go should. Go is generally fairly honest about the costs of operations, and using channels for synchronization is more expensive than a mutex and probably always will be. If you have a case where a mutex is good enough, and a shared data structure is a great case, you really should stick with simple and clearly correct code; that it performs well is a bonus. Channels aren't the answer to everything and shouldn't try to be.

(Years ago I wrote Goroutines versus other concurrency handling options in Go about much the same issues, but my thinking about what goroutines were good and bad at was much less developed then.)

(This entry was sparked by reading Golang: Concurrency: Monitors and Mutexes, A (light) Survey, because it made me start thinking about why the "monitor goroutine" pattern is such an awkward one in Go.)

GoChannelsAndReplies written at 01:02:59; Add Comment

2019-05-23

On programming languages belonging (or not) to their community

On Twitter, I said some things as follow on notes to how Go is Google's language, not ours:

For example, C and especially C++ are community languages and this shows in what gets added to their standards. You can find a lot of people who think that C++'s growth is not a good thing. Go is unlikely to grow that way partly because the Go core team is willing to do nothing.

There are language communities and standards committees that are willing to do nothing, but generally there are lots of voices for doing something and so something gets done. Once you're a community thing, saying 'no, we're not doing anything' is very hard.

C++ is my canonical example of a language that definitely belongs to its community, for better or worse. There are several major implementations, two of which are themselves genuine community projects, and the direction of C++ is set by an open standards committee with a relatively distributed membership. C++ is also a famously complicated language which has had major changes over the years that have apparently significantly affected how you should write C++ code today, and I've read any number of gripes about the directions in which it's moving and whether or not future changes are good ideas (including Bjarne Stroustrup himself, per Remember the Vasa! [PDF] (via)).

Some of this is due to C++'s specific situation, but beyond that I think that communities have structural influences that make it hard for them to avoid heading in this direction. There are always loud people who have ideas and in general it is hard for a community to consistently say 'no, we're not doing anything', especially in a fuzzy situation; for example, often there is social pressure to give sufficient justification for your refusal to take action. In one way, the big advantage that languages with some form of control by a benevolent dictatorship have is that there are people who can say 'no' over and over again and get away with it.

(It helps a great deal if these people have good taste and say no to the right things, of course. But saying 'no' to everything can work too, if the language is in the right shape to start with. You may miss out on useful advances, but at least you won't ruin the good things that the language already has.)

The two obvious problems with a benevolent dictatorship are the other sides of this. A benevolent dictatorship can ram through a change even when it's a bad idea, and they can stop changes that are good ideas. The more that a language genuinely belongs to its community, the more the language will reflect what the community wants and needs (or thinks it does), rather than what some people think the community should have (for its own good). Just ask people about Python 2 and Python 3 for an illustration of that.

(Even when a forced change is theoretically a good thing, it has social and pragmatic effects on the community of people using the language that can be overlooked or waved away by the enthusiasts at the center of their language. You can get rousing debates over whether the pain is 'worth' the end result, but it is undeniable that pain was inflicted.)

LanguagesAndCommunityOwnership written at 00:49:25; Add Comment

2019-05-22

Go is Google's language, not ours

Over on Twitter, I saw the following question (via):

There is lot of conversation around generics in #go, can't we have something like OpenGo, where community can implement generics , rather that waiting for official #go generics to happen ? Something like OpenJDK

There are many answers for why this won't happen, but one that does not usually get said out loud is that Go is Google's language, not the community's.

Yes, there's a community that contributes things to Go, some of them important and valued things; you only have to look at the diversity of people in CONTRIBUTORS or see the variety of people appearing in the commits. But Google is the gatekeeper for these community contributions; it alone decides what is and isn't accepted into Go. To the extent that there even is a community process for deciding what is accepted, there is an 800-pound gorilla in the room. Nothing is going to go into Go that Google objects to, and if Google decides that something needs to be in Go, it will happen.

(The most clear and obvious illustration of this is what happened with Go modules, where one member of Google's Go core team discarded the entire system the outside Go community had been working on in favour of a relatively radically different model. See eg for one version of this history.)

Or in short, Go has community contributions but it is not a community project. It is Google's project. This is an unarguable thing, whether you consider it to be good or bad, and it has effects that we need to accept. For example, if you want some significant thing to be accepted into Go, working to build consensus in the community is far less important than persuading the Go core team.

(As a corollary, sinking a lot of time and effort into a community effort that doesn't have enthusiastic buy-in from the Go core team is probably a waste of time; at the most, your work might help the Go core team understand the issues better. Again, see Go modules for this in action.)

In general, it's extremely clear that the community's voice doesn't matter very much for Go's development, and those of us working with Go outside Google's walls just have to live with that. If we're very lucky, our priorities match up with Google's; if we're reasonably lucky, the Go core team and Google will decide that they care enough about our priorities to work on them. The good news is that Google and the Go core team do care (so far) about Go being a success in the outside world, not just inside Google, so they're willing to work on pain points.

(On the good and bad scale, there is a common feeling that Go has done well by having a small core team with good taste and a consistent vision for the language, a team that is not swayed by outside voices and is slow moving and biased to not making changes.)

PS: I like Go and have for a fair while now, and I'm basically okay with how the language has been evolving and how the Go core team has managed it. I certainly think it's a good idea to take things like generics slowly. But at the same time, how things developed around Go modules has left a bad taste in my mouth and I now can't imagine becoming a Go contributor myself, even for small trivial changes (to put it one way, I have no interest in knowing that I'm always going to be a second class citizen). I'll file bug reports, but that's it. The whole situation leaves me with ambiguous feelings, so I usually ignore it completely.

(And claims by the Go team that they really care about the community and want them to be involved now sound laughable. I'm sure they care, but only up to a certain point. I think that the Go core team should be bluntly honest about the situation, rather than pretend and implicitly lead people on.)

Sidebar: Google and the Go core team

You could ask if Go is Google's language or the Go core team's language, since Go's direction is set and controlled by that small core team. However, at the moment I believe that most or all of the active Go core team is employed by Google, making the distinction impossibly to determine in practice (at least from outside Google). In practice we'll only get a chance to find out who Go really belongs to if Go core team members start leaving Google and try to remain active in determining Go's direction. If that works, especially if the majority of them no longer work for Google, then Go probably is their language, not Google's, in the same way that Python has always been Guido van Rossum's language regardless of who he worked for at the time.

On a practical level, it's undeniable that at the moment Google provides much of the infrastructure and resources to support Go, such as golang.org, and as a result owns the domain names and so on. Google also holds the trademarks on 'Go' as a programming language, per their trademarks list.

GoIsGooglesLanguage written at 01:24:28; Add Comment

2019-05-16

Go has no type for types in the language

Over on r/golang, someone asked What's the point of limiting .(type) evaluations to type switches:

I know the Go feature set is very well thought out and I'm sure there's a good reason for it, but I'm curious why I can't do fmt.Println(x.(type))

(As pointed out in the replies, you can get the same information with fmt's %T formatting verb.)

Although there are a number of things going on here, one of them is that Go has opted not to make types visible as explicit entities in the language. In a language like Python, the type of things is explicitly exposed as a fully visible part of the language, through operations like type(). In Go, there's no such direct way of exposing the type of something and you can only compare it against other types through mechanisms like type switches.

(Python goes further and makes types pretty much as first class entities as functions are.)

Part of what this means is that in Go, you cannot write an expression like 'x := y.(type)' not just because the language syntax forbids it, but because there is no standard type that the variable x can be. If you wanted to allow this, you would have to create a new Go type and define what its behavior was.

Go does make type information accessible, but only through the reflect standard library package. There are two things about this. First, reflect isn't part of the language itself, so another implementation of Go would be theoretically free to leave it out (although a certain amount of code would have problems); TinyGo doesn't quite go that far, although it has a minimal version. Second, it's relatively clear that what you get from reflect and manipulate through it is not literally the type information from the language; instead, it is a reflect-created thing that may or may not actively reflect the underlying reality. The only nominal exceptions are reflect.SliceHeader and reflect.StringHeader, and reflect explicitly says that you can't use these safely.

(My personal guess is that part of why reflect provides them is so that if the string or slice header ever changes, code can break visibly because the reflect structure fields it's trying to use aren't there any more. This wouldn't happen if Go forced everyone to define their own private versions of these runtime structures; instead you'd just get silently corrupted results. And people do do things with these headers today.)

In general, not explicitly requiring Go the language to have types as explicit, visible entities preserves some implementation freedom. However, I believe that Go still does require that an implementation keeps around some more type information than you'd expect; for example, I don't believe it would be proper in general for a Go implementation to reduce everything from distinct named types to structural types after compiling the program. This is because if you have the following code, I believe the type cast is required to fail:

type fred int
var a fred

[...]
b := interface{}(a)
i, ok := b.(int)

(You can construct a more elaborate example with a type switch with both fred and int as options.)

Structurally, fred and int are the same thing, but Go explicitly distinguishes between named types even if they are structurally identical. As seen here, this distinction can be recovered dynamically, which implies that it must be accessible to the runtime; the runtime needs to have some way to distinguish between a fred and an int that have been turned into interfaces.

(You can get similar situations if the two differently named types have different method sets and you are converting between different interfaces; here, one type might be convertible but the other not.)

PS: One reason a Go implementation might be interested in some degree of type erasure is to reduce the amount of additional type information that has to be carried around with the compiled binary. If the actual code never needs the information, why have it present at all? But clearly this would require whole-program analysis so you can tell whether or not these things are needed, and you'd probably have to explicitly not support things like fmt's %T verb.

GoNoTypeForTypes written at 01:10:19; Add Comment

2019-05-03

In Go, unsafe type conversions are still garbage collection safe

I was recently reading to slice or not to slice (via), where one of the example ways of going from a slice to an array is the non copying brute force approach with unsafe. To quote the first part of the example code:

bufarrayptr := (*[32]byte)(unsafe.Pointer(&buf[0])) // *[32]byte (same memory region)

(Here buf is a slice, and we'll assume that we've already verified that it has a len() of at least 32.)

One of the things you might wonder about here is whether this is safe from garbage collection. After all, we're eventually discarding buf, the original and true reference to the slice and its backing array. Will the Go garbage collector someday free the memory involved even though we still have a reference to it in the form of bufarrayptr? On the one hand, we do have a reference to the backing array; on the other hand, we created the reference through the use of unsafe and thus went behind Go's back to do it.

(This would be an analog of the C mistake of retaining a pointer to something that you've free()'d.)

Conveniently, the answer is that this unsafe type conversion is still safe from being garbage collected. As I discussed in Exploring how and why interior pointers keep entire objects alive, the Go garbage collector is blind to the type of pointers; it intrinsically knows what a particular block of memory is, and any pointer to it of any type is as good as any other pointer. It does not have to be the original type, and in fact it can generally be a completely incorrect type. As far as garbage collection goes, I suspect that you can get away with a pointer of a type that is larger than what you're actually pointing to.

(The Go language specification does not say that this has to work, though, although it does say a bit about unsafe. The unsafe package itself implicitly says that reinterpreting to a too-large type is invalid usage.)

There is an important consequence of this type blindness in the garbage collector if you are doing type conversions, and that is what is and isn't a pointer is set permanently from the original type. When the Go runtime allocates memory initially, it records the 'shape' of that memory, including what portions of it are pointers. Regardless of how you reinterpret the memory, that original shape sticks to it, and that original shape is what the garbage collector uses when determining what pointers to follow to find more used memory and what bytes to not look at further because they're not pointers. If you put non-pointer data in what the garbage collector thinks is a pointer, it will probably panic your program. If you put pointer data in what the garbage collector thinks is not a pointer, the garbage collector may decide that some memory is unused and free it even though you think you have a pointer to it; when you use that pointer later, you will be sad.

(In general, reinterpreting non-pointer memory as a pointer is not necessarily safe. The Go runtime does some things when you tell it you're modifying a pointer, and it makes no guarantees that those things will be safe if the actual memory did not start out its life as a pointer.)

PS: I think that this garbage collection safe behavior of unsafe.Pointer is implicitly guaranteed by the first unsafe.Pointer usage pattern. This isn't part of the language specification itself but it is part of the current unsafe package specification, so it's pretty close. As a practical matter, I think that the Go authors see this sort of usage as valid and thus are likely to support it for as long as possible.

GoUnsafeTypeConvGCSafety written at 22:17:16; Add Comment

2019-04-22

Go 2 Generics: The usefulness of requiring minimal contracts

I was recently reading Ole Bulbuk's Why Go Contracts Are A Bad Idea In The Light Of A Changing Go Community (via). One of the things that it suggests is that people will write generics contracts in a very brute force fashion by copying and pasting as much of their existing function bodies into the contract's body as possible (the article's author provides some examples of how people might do this). As much as the idea makes me cringe, I have to admit that I can see how and why it might happen; as Ole Bulbuk notes, it's the easiest way for a pragmatic programmer to work. However, I believe that it's possible to avoid this, and to do so in a way that is beneficial to Go and Go programmers in general. To do so, we will need both a carrot and a stick.

The carrot is a program similar to gofmt which rewrites contracts into the accepted canonical minimal form; possibly it should even be part of what 'gofmt -s' does in a Go 2 with generics. Since contracts are so flexible and thus so variable, I feel that rewriting them into a canonical form is generally useful for much the same reasons that gofmt is useful. You don't have to use the canonical form of a contract, but contracts in canonical form will likely be easier to read (if only because everyone will be familiar with it) and easier to compare with each other. Such rewriting is a bit more extreme than gofmt does, since we are going from syntax to semantics and then back to a canonical syntax for the semantics, but I believe it's likely to be possible.

(I think it would be a significant danger sign for contracts if this is not possible or if the community strongly disagrees about what the canonical form for a particular type restriction should be. If we cannot write and accept a gofmt for contracts, something is wrong.)

The stick is that Go 2 should make it a compile time error to include statements in a contract that are not syntactically necessary and that do not add any additional restriction to what types the contract will accept. If you throw in restrict-nothing statements that are copied from a function body and insist that they stay, your contract does not compile. If you want your contract to compile, you run the contract minimizer program and it fixes the problem for you by taking them out. I feel that this is in the same spirit as requiring all imports to be used (and then providing goimports). In general, future people, including your future self, should not have wonder if some statement in a contract was intended to create some type restriction but accidentally didn't, and you didn't notice because your current implementation of the generic code didn't actually require it. Things in contracts should either be meaningful or not present at all.

To be clear here, this is not the same as a contract element that is not used in the current implementation. Those always should be legal, because you always should be able to write a contract that is more strict and more limited than you actually need today. Such a more restrictive contract is like a limited Go interface; it preserves your flexibility to change things later. This is purely about an element of the contract that does not add some extra constraint on the types that the contract accepts.

(You can pretty much always relax the restrictions of an existing contract without breaking API compatibility, because the new looser version will still accept all of the types it used to. Tightening the restrictions is not necessarily API compatible, because the new, more restricted contract may not accept some existing types that people are currently using it with.)

PS: I believe that there should be a gofmt for contracts even if their eventual form is less clever than the first draft proposal, unless the eventual form of contracts is so restricted that there is already only one way to express any particular type restriction.

Go2RequireMinimalContracts written at 22:13:08; Add Comment

2019-04-10

A Git tool that I'd like and how I probably use Git differently from most people

For a long time now, I've wished for what has generally seemed to me like a fairly straightforward and obvious Git tool. What I want is a convenient way to page through all of the different versions of a file over time, going 'forward' and 'backward' through them. Basically this would be the whole file version of 'git log -p FILE', although it couldn't have the same interface.

(I know that the history may not be linear. There are various ways to cope with this, depending on how sophisticated an interface you're presenting.)

When I first started wanting this, it felt so obvious that I couldn't believe it didn't already exist. Going through past versions of a file was something that I wanted to do all the time when I was digging through repositories, and I didn't get why no one else had created this. Now, though, I think that my unusual desire for this is one of the signs that I use Git repositories differently from most people, because I'm coming at them as a sysadmin instead of as a developer. Or, to put it another way, I'm reading code as an outsider instead of an insider.

When you're an insider to code, when you work on the code in the repository you're reading, you have enough context to readily understand diffs and so 'git log -p' and similar diff-based formats (such as 'git show' of a commit) are perfectly good for letting you understand what the code did in the past. But I almost never have that familiarity with a Git repo I'm investigating. I barely know the current version of the file, the one I can read in full in the repo; I completely lack the contextual knowledge to mentally apply a diff and read out the previous behavior of the code. To understand the previous behavior of the code, I need to read the full previous code. So I wind up wanting a convenient way to get that previous version of a file and to easily navigate through versions.

(There are a surprising number of circumstances where understanding something about the current version of a piece of code requires me to look at what it used to do.)

I rather suspect that most people using Git are developers instead of people spelunking the depths of unfamiliar codebases. Developers likely don't have much use for viewing full versions of a file over time (or at least it's not a common need), so it's probably not surprising that there doesn't seem to be a tool for this (or at least not an easily found one).

(Github has something that comes close to this, with the 'view blame prior to this change' feature in its blame view of a particular file. But this is not quite the same thing, although it is handy for my sorts of investigations.)

GitViewFileOverTimeWish written at 01:27:06; Add Comment

2019-04-08

An example of a situation where Go interfaces can't substitute for generics

I recently read Why Go Contracts Are A Bad Idea In The Light Of A Changing Go Community (via). I have some views on this, but today I want to divert from them to touch on one thing I saw in the article (and that I believe I've seen elsewhere).

In the article, the author cites the stringer contract example from the draft proposal:

func Stringify(type T stringer)(s []T) (ret []string) {
  for _, v := range s {
    ret = append(ret, v.String())
  }
  return ret
}

contract stringer(x T) {
  var s string = x.String()
}

The author then says:

All that contracts are good for is ensuring properties of types. In this particular case it could (and should) be done simpler with the Stringer interface.

There are two ways to read this (and I don't know which one the author intended, so I am using their article as a springboard). The first way is that the contract is a roundabout way of saying that the type T must satisfy the Stringer interface, and we should be able to express this type restriction directly. I don't entirely argue with this, but I also don't think Go has any particularly compact and clear way of doing this now. Perhaps there should be a special syntax for it in a world with generics, although that depends on how many contracts will be basically specifying required method functions versus other requirements on types (such as comparability or addibility).

The other way of reading this is to say that our Stringify() example as a whole should be rewritten to use interfaces and not generics. Unfortunately this isn't possible; you can't write a function that behaves the same way using interfaces. This is because a non-generic function using interfaces must have the type signature:

func Stringify(s []Stringer) (ret []string)

The problem with this type signature is the famous and long standing Go issue that you cannot cast an array of some type to an array of some interface, even if the type satisfies the interface. The power of the generic version of Stringify is that it can work on your existing array of elements of some type; you do not have to manually create an array of those elements turned into interfaces.

The larger problem is that creating an interface value for every existing value is not free (even beyond the cost of a new array to hold them all). At a minimum it churns memory and makes extra work for the garbage collector. If you're starting with concrete values that are not pointers, you'll hit other efficiency issues as well when your Stringify calls the String() receiver method on your type.

The attraction of generics in this situation is not merely being able to implement generic algorithms in a way that is free of the sort of flailing around with interfaces that we see in the current sort package. It is also that the actual implementation should be efficient, ideally as efficient as a version written for the specific type you're using. By their nature, interfaces cannot deliver this level of efficiency; they always involve an extra layer of interface values and indirection.

(Even if you don't care about the efficiency, the need to transform your slice of T elements to a slice of Stringer interface values requires you to write some boring and thus annoying code.)

GoInterfacesVsGenerics written at 21:36:27; Add Comment

2019-03-08

Exploring how and why interior pointers in Go keep entire objects alive

Recently I was reading Things that surprised me in go (via), where one of the surprising things is that an interior pointer to a struct's field keeps the entire struct from being garbage collected. Here is a cut down and slightly changed version of the example from the article:

func getInteriorPointer() *int {
    type bigStruct struct {
        smallThing   int
        someBigThing [aLot]int
    }
    b := bigStruct{smallThing: 3}
    return &b.smallThing
}

After this returns, the entire bigStruct will remain in memory, not just the int of smallThing. As the author notes, in a hypothetical perfect GC this wouldn't happen; only the int would remain.

The direct reason that Go works this way is that it puts all data for a compound object like a struct or an array into one block of memory. This makes for efficient compound objects (both in terms of memory and in terms of access), at the cost of keeping the entire object alive as a unit. There are a number of versions of this, not just with structs; for example, a slice of a string will keep the entire string alive, not just the substring you've sliced out. The same is true of slices of arrays (such as arrays of bytes), even if you've explicitly limited the capacity of the slice so it can't be quietly grown using the original backing array.

But let's go deeper here. Let's suppose we want a perfect GC that still preserves this property of compound objects. Unfortunately there are at least two significant complications, one in determining what memory is still in use and one in actually freeing the memory.

Today, with indivisible compound objects, Go can keep a simple map from memory addresses to the block of memory that they keep alive and the garbage collector doesn't have to care about the type of a pointer, just its address, because the address alone is what determines how much memory is kept alive. In a world where compound objects are divisible, we must somehow arrange for &b.smallThing and &b to be treated differently by the garbage collector, either by giving the garbage collector access to type information or by having them point to different addresses.

(When planning out a scheme for this, remember that structs can be nested inside other structs. Go makes such struct embedding idiomatic, and it's natural to put your embedded struct at the start of your new struct; in fact I think it's the accepted style for this.)

The simplest way to deal with freeing only part of the memory of an object is for Go to have copying garbage collection. In such a hypothetical situation, you would just copy the int instead of the whole struct and everything would work out nicely. However Go does not have a copying GC today, so at a minimum freeing only part of a compound object leaves you with increased fragmentation of free memory; what was one big block of memory for the compound object gets broken up into fragments of free space with bits of still allocated space in various spots.

Unfortunately this partial freeing doesn't work in practice in Go's current memory allocation system in many cases. To simplify, Go splits up allocations into 'size classes' and then performs block allocation with a given size class (if you want to read more, start here). This is highly efficient and has various other advantages, but it means that you simply can't turn an allocation from one size class into free memory for another size class. Effectively you free either all of it or none of it.

You could change Go's memory allocation system to make this possible, but it's very likely that such a change would have performance impacts (and possibly memory usage impacts). With Go's current non-copying GC, you might well find that it often simply wasn't worth it to free up part of an object, either in memory savings or in CPU usage. Certainly you would have more complex memory allocation and garbage collection, and those are already complicated enough.

(With a copying GC, your new copied object can be allocated in the right size class, since the GC definitely knows how big it is now.)

GoInteriorPointerGC written at 22:33:13; Add Comment

(Previous 11 or go back to March 2019 at 2019/03/03)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.