Wandering Thoughts

2017-08-23

An unexpected risk of using Go is it ending support for older OS versions

A few years ago I wrote The question of language longevity for new languages, where I used a concern about Go's likely longevity as the starting point to talk about this issue in general. The time since then has firmed up Go's position in general and I still quite like working in it, but recently a surprising and unexpected issue has cropped up here that is giving me some cause for thought. Namely, the degree to which Go will or won't continue to support older versions of OSes.

(By 'Go' here I mean the main Go implementation. Alternate implementations such as gccgo have their own, different set of supported OS versions and environments.)

Like most compilers, Go has a set of minimum version requirements for different OSes. It's actually fairly hard to find out what all of these are; the requirements for major binary release platforms can be found here, but requirements for platforms may only show up in, for example, the Go 1.8 release notes. Probably unsurprisingly, Go moves these minimum requirements forward every so often, usually by dropping support for OS versions that aren't officially supported any more. A couple of topical examples, from the draft Go 1.9 release notes, are that Go 1.9 will require OpenBSD 6.0 and will be the last Go release that supports FreeBSD versions before 10.3 (theoretically Go 1.8 supports versions as far back as FreeBSD 8).

(The best platform requirements documentation appears to be in the Go wiki, here and here, which was a bit hard to find.)

I'm sure that for many people, Go's deprecation of older and now unsupported OS versions is not a problem because they only ever have to deal with machines running OS versions that are still supported, even for OSes (such as OpenBSD) that have relatively short support periods. Perhaps unfortunately, I don't operate in such an environment; not for OpenBSD, and not for other OSes either. The reality is that around here there are any number of systems that don't change much (if at all) and just quietly drift out of support for one reason or another, systems that I want or may want to use Go programs on. This makes the degree to which Go will continue to support old systems somewhat of a concern for me.

On the other hand, you can certainly argue that this concern is overblown. Building Go from source and keeping multiple versions around is easy enough, and old binaries of my programs built with old Go versions are going to keep working on these old, unchanging systems. The real problems would come in if I wanted to do ongoing cross-platform development of some program and have it use features that are only in recent versions of Go or the standard library. Life gets somewhat more exciting if I use third party packages, because those packages (or the current versions of them) may depend on modern standard library things even if my own code doesn't.

(If my program doesn't use the very latest shiny things from the standard library, I can build it with the latest Go on Linux but an older Go on OpenBSD or FreeBSD or whatever.)

GoOSVersionsRisks written at 00:56:31; Add Comment

2017-08-13

Sorting out slice mutability in Go

I've recently been writing some Go code that heavily uses and mutates slices, mostly through append(), often slices passed as arguments to functions that then returned what were theoretically different, mutated slices. This left me uncertain that both I and my code were doing the right thing or if I was creating hidden bear traps for myself.

So let's start out with some completely artificial example code:

func test1(h string, hl []string) []string {
  return append(hl, h)
}

// Add h at position p, moving the current
// element there to the end.
func test2(h string, p int, hl []string) []string {
  t := append(hl, hl[p])
  t[p] = h
  return t
}

Suppose you call both of these functions with a []string slice that you want to use later in its current state, ie you want it to not be changed by either call. Is it the case that this will be true for either or both functions?

The answer turns out to be no. Both functions can mutate the original slice in visible ways. Yes, even test1() can have this effect, and here's a demonstration:

func main() {
  t := []string{"a", "b", "c",}
  t2 := test1("d", t)
  t3 := test1("fred", t2)
  _ = test1("barney", t2)
  fmt.Printf("t3[4] is %s\n", t3[4])
}

Although you would expect that t3[4] is "fred", because that's what we appended to t2 to create t3, it is actually "barney" (on the Go Playground, at least, and also on my 64-bit x86 Linux machine).

In Go, slices are a data structure built on top of arrays, as covered in Go Slice: usage and internals. When you generate a slice from an explicit array, the slice is backed by and uses that array. When you work purely with slices (including using append()), as we are here, the resulting slices are backed by anonymous arrays; these anonymous arrays are where the actual data involved in your slice is stored. These anonymous arrays may be shared between slices, and when you copy a slice (for example by calling a function that takes a slice as its argument), you do not create a new copy of the anonymous array that backs it.

Slices have a current length and a maximum capacity that they can grow to. If you call append() on a slice with no capacity left (where len(slice) is cap(slice)), append() has to create a new backing array for the slice and copy all the current elements over into it. However, if you call append() on a slice that has remaining capacity, append() simply uses a bit of the remaining capacity in the underlying backing array; you get a new slice from append(), but it is still using the same backing array. If this backing array is also used by another slice that you care about, problems can ensue.

With test2(), we have a relatively straightforward and obvious case. If append() doesn't have to create a new backing array, we'll mutate the existing one by changing the string at position p. Writing to an existing element of a slice is a clear warning sign here, and it's not too hard to look out for this in your code (and in functions that you call, such as sort.Strings).

With test1() things are more subtle. What is going on here is that when append() doesn't have to increase a slice's capacity, it ends up writing the new element to the original backing array. Our program arranges for t2's anonymous backing array to have spare capacity, so both the second and the third calls to test1() wind up writing to <anonymous-array>[4] and "fred" turns into "barney". This is alarming (at least to me), because I normally think of pure append() calls as being safe; this demonstrates that they are not.

To guard against this, you must always force the creation of a new backing array. The straightforward way to do this is:

func test1(h string, hl []string) []string {
  t := append([]string{}, hl...)
  return append(t, h)
}

(You can reduce this to a one-liner if you want to.)

A version that might be slightly more efficient would explicitly make() a new slice with an extra element's worth of capacity, then copy() the old slice to it, then finally append() or add the new value.

(Whether this version is actually more efficient depends on whether you're going to use plain append() to add even more elements to the new slice later on.)

All of this is a little bit counter-intuitive to me. I could work out what was going on and why it has to be this way, but until I started to really think about it, I thought that test1() was safe. And it sometimes is, which makes things tricky; if t2 had no extra capacity, t3 would have allocated a new backing array and everything would have been fine. When slices backed by anonymous arrays have extra capacity is an implementation detail and depends on both the exact numbers involved and the exact path of slice growth.

(The test1() case is also tricky because the mutation is not visible in the original t2 slice. In test2(), at least the original is clearly altered. In a test1() case, the two append()s to the slice might be quite separated in the code, and the damage is only visible if and when you look at the first new slice.)

PS: This implies that calling append() on the same slice in two different goroutines creates a potential data race, at least if you ever read the newly appended element.

GoSliceMutability written at 01:31:31; Add Comment

2017-07-07

Programming Bourne shell scripts is tricky, with dim corners

There have been a bunch of good comments on my entry about my views on Shellcheck, so I want to say just a bit too much to fit in a comment of my own. I'll mostly be talking about this script, where I thought the unquoted '$1' was harmless:

#!/bin/sh
echo Error: $1

Leah Neukirchen immediately pointed out something I had completely forgotten, which is that unquoted Bourne shell variable expansion also does globbing. The following two lines give you the same output:

echo *
G="*"; echo $G

This is surprising (to me), but after looking at things I'll admit that it's also useful. It gives you a straightforward way to dynamically construct glob patterns in your shell script and then expand them, without having to resort to hacks that may result in too much expansion or interpretation of special characters.

Then Vidar, the author of Shellcheck, left a comment with an interesting PS suggesting some things that my unquoted use of $1 was leaving me open to:

./testit '-e \x1b[33mTest' # Ascii input results in binary output (when sh is bash)
./testit '-n foo'          # -n disappears

This is a nice illustration of how tricky shell programming can be, because these probably don't happen but I can't say that they definitely don't happen in all Unix environments (and maybe no one can). As bugs, both of these rely on the shell splitting $1 into multiple arguments to the actual command and then echo interpreting the first word (now split into a separate argument) as a -n or -e option, changing its behavior. However, I deliberately wrote testit's use of echo so that this shouldn't happen, as $1 is only used after a non-argument option (the Error: portion).

With almost all commands in a traditional Unix, the first regular argument turns off all further option processing; everything after it will be considered an argument, no matter if it could be a valid option. Using an explicit '--' separator is only necessary if you want your first regular argument to be something that would otherwise be interpreted as an option. However, at least some modern commands on some Unixes have started accepting options anywhere on the command line, not just up to the first regular argument. If echo behaves this way, Vidar's examples do malfunction, with the -n and -e seen as actual options by echo. Having echo behave this way in your shell is probably not POSIX compatible, but am I totally sure that no Unix will ever do this? Of course not; Unixes have done some crazy shell-related things before.

Finally, Aristotle Pagaltzis mentioned, about his reflexive quoting of Bourne shell variables when he uses them:

I’m just too aware that uninvited control and meta characters happen and that word splitting is very complex semantically. [...]

This is very true, as I hope this entry helps illustrate. But for me at least there are three situations in my shell scripts. If I'm processing unrestricted input in a non-friendly environment, yes, absolutely, I had better put all variable usage in quotes for safety, because sooner or later something is going to go wrong. Generally I do and if I haven't, I'd actually like something to tell me about it (and Shellcheck would be producing a useful message here for such scripts).

(At the same time, truly safe defensive programming in the Bourne shell is surprisingly hard. Whitespace and glob characters are the easy case; newlines often cause much more heartburn, partly because of how other commands may react to them.)

If I'm writing a script for a friendly environment (for example, I'm the only person who'll probably run it) and it doesn't have to handle arbitrary input, well, I'm lazy. If the only proper way to run my script is with well-formed arguments that don't have whitespace in them, the only question is how the script is going to fail; is it going to give an explicit complaint, or is it just going to produce weird messages or errors? For instance, perhaps the only proper arguments to a script are the names of filesystems or login names, neither of which have whitespace or funny characters in them in our environment.

Finally, sometimes the code in my semi-casual script is running in a context where I know for sure that something doesn't have whitespace or other problem characters. The usual way for this to happen is for the value to come from a source that cannot (in our environment) contain such values. For a hypothetical example, consider shell code like this:

login=$(awk -F: '$3 == NNN {print $1}' /etc/passwd | sed 1q)
....
echo whatever $login whatever

This is never going to have problematic characters in $login (for a suitable value of 'never', since in theory our /etc/passwd could be terribly corrupted or there could be a RAM glitch, and yes, if I was going to (say) rm files as root based on this, $login would be quoted just in case).

This last issue points out one of the hard challenges of a Bourne shell linter that wants to only complain about probable or possible errors. To do a good job, you want to recognize as many of these 'cannot be an error' situations as possible, and that requires some fairly sophisticated understanding not just of shell scripting but of what output other commands can produce and how data flows through the script.

By the way, Shellcheck impressed me by doing some of this sort of analysis. For example, it doesn't complain about the following script:

#!/bin/sh
ADIR=/some/directory/path
#ADIR="$1"
if [ ! -d $ADIR ]; then
   echo does not exist: $ADIR
fi

(If you uncomment the line that sets ADIR from $1, Shellcheck does report problems.)

BourneShellTrickyAndDim written at 00:00:22; Add Comment

2017-07-06

My current views on Shellcheck

As part of the reaction to my last entry, I've been asked how I feel about shellcheck (also). As it happens, I have some opinions here.

Shellcheck is what I'd classify as a linter. I've dealt with a lot of linters and I've wound up with a general approach to looking at them, which I summarize as a linter is interesting to the extent that it gives me useful new information. To do this, a linter needs to do two things; it must find some genuine issues that matter (or at least that I care about), and it can't complain too much about stuff I don't care about (or where it's wrong). Or in other words a useful linter must have some signal (the more the better) and not too much noise.

What is signal is somewhat in the eye of the beholder. It is not enough for a linter to be accurate; it has to be interesting as well. For an example of this, I will turn to Go. The Go community has a whole bunch of high-signal low-noise linters, starting with 'go vet' itself. If gosimple or unparam report something, it is real and you should probably care about it; if you fix something reported by either tool, a project maintainer would probably take your patch. Then there is errcheck. Errcheck is accurate, but it turns out that silently ignoring errors is extremely common in production Go code and probably very few maintainers would accept patches to litter their code with various ways to make errcheck shut up. In practice errcheck is too persnickety and so I never bother running it against my Go code, unlike other Go linters.

(The Go world has settled on a style of linters where there are a bunch, each of which is narrowly focused on a single area. This has spawned the obvious meta-linter project.)

Unfortunately, when I gave Shellcheck a spin it fell much more towards the errcheck end of the linter spectrum than the go vet end. Here is an example that shows one issue I have with it:

; cat testit
#!/bin/sh
echo Error: $1
; shellcheck testit
[...]
echo Error: $1
            ^-- SC2086: Double quote to prevent globbing and word splitting.

This is technically correct yet almost useless in practice. It's not an actual problem; I can't think of a situation where the lack of double quotes will cause an issue here, given that we are just running echo (and it has an initial argument). Perhaps it would be good style to write '"$1"' instead, but then it might be good style to litter your Go code with markers that you're explicitly ignoring errors (and thus make errcheck shut up). Neither are going to happen in this reality.

Linting shell scripts is a hard problem and I applaud Shellcheck for trying. If I was writing very high assurance shell scripts that absolute had to work, even in a hostile environment, I might use Shellcheck and make sure all my scripts passed all its checks. But for finding real issues in my casual-usage shell scripts, it is too much noise and too little signal in its current state.

(Also, according to a test with the version available online through shellcheck.net, it doesn't currently spot my variable reuse bug.)

Another issue or hazard with shellcheck is illustrated by the following warning it spat out when I tried it on a random administration script we have:

In errmail line 22:
  trap "rm -rf $TMPDIR; exit" 0 1 2 13 15
       ^-- SC2064: Use single quotes, otherwise this expands now rather than when signalled.

Actually, having this expand immediately is deliberate. But shellcheck can't tell that because it requires contextual knowledge, and shellcheck has decided to tilt strongly towards having opinions on your style instead of just reporting on things that are almost certainly errors or problems. A linter having style opinions is a completely fair thing and can be valuable, but it basically always raises the linter's noise level (and only really works if you agree with the linter's opinions).

(This also shows the limitations of Shellcheck's abilities, because $TMPDIR never changes in the rest of the script so the actual behavior of the trap would be the same either way. But Shellcheck either can't see that or doesn't care, so it emits this warning for an issue that doesn't actually exist in the real script.)

PS: To be absolutely clear, I don't think this makes Shellcheck a bad idea or a bad implementation. It's just not a tool that I'm likely to find very useful in practice, and thus not one I'm likely to use (despite initial hopes for it, because of course I'd love a good way to find lurking errors in my Bourne shell scripts).

Sidebar: The signal to noise ratio in practice

Some people will say that a linter that produces some signal makes it worth wading through whatever noise in order to get said signal. This might be correct in theory, for extremely high-assurance programming, but it is wrong in practice. Besides other issues, it's generally very hard for humans to find that small amount of signal in a lot of noise.

For less critical programming, it is unquestionably a bad ratio of work to reward. There are almost always going to be more productive things you can do with X amount of time than to pick through lots of linter noise in search of some signal that may or may not be there for your current code.

ShellcheckNoiseVsSignal written at 02:04:04; Add Comment

2017-07-05

How I shot my foot because the Bourne shell is different

I have an invariable, reflexive habit in the Bourne shell, which is that I call my for loop variables $i. This reflex is so ingrained that if I try to fight it, I can wind up writing loops that look like this:

for avar in ....; do
    somecommand $i
done

I may have carefully written the for loop using a sensible, non-$i variable name, but then when I was writing the body of the loop I forgot and reflexively used $i. This is always a fun, forehead-slapping issue to debug (even if it often doesn't take long, especially in tiny for loops).

(Much of this isn't unique to the Bourne shell; like any number of people, I normally use i as my default loop variable name no matter what language I'm working in.)

Recently I wrote a script where the top level looked something like this:

... various functions ...
reporton() { ... }

for i in $@; do
    case "$i" in
       magic) reporton "magic $i" $(magic-thing $i) ;;
       ...) ... ;;
       /cs/htuser/*) reporton "$i" $(webdir-fs $i) ;;
       ...) ... ;;
       *) # assume username
          reporton "<$i>" $(user-fs $i)
          reporton "<$i> other" $(webdir-fs /cs/htuser/$i) ;;
    esac
 done

Most of the various arguments I could give the script worked fine. In the username case, the first reporton worked properly but then the second one failed mysteriously, printing out weird messages. To make it more puzzling, the same reporton worked fine when run independently (in the /cs/htuser/* case).

It took a bit of time before the penny dropped. Several of my shell functions had their own for loops, and of course I had reflexively written them using $i as the loop variable. Since I was writing this in more or less pure Bourne shell, I wasn't using 'local i' in any of these functions and so everyone's for loops were changing the same global $i variable.

For most of the functions, this worked out; they didn't call other $i-changing functions inside their own for loops, so the value of $i was stable in their for loop bodies. But at the top level this wasn't the case; I was obviously calling the whole stack of functions and was having $i's value changed out from underneath me. Most of the time this didn't matter because I only used $i once (before its value got changed by the functions I called). The 'assume username' case was the exception; as you can see, I ran two reporton's in succession and used $i with each. When I got to the second reporton, $i's value was completely mangled and things blew up.

Most languages that I deal with are lexically scoped languages, where reusing the same name for variables in different scopes just works (in that each version of the variable name is completely independent). Lexical scoping is so pervasive in my programming languages that I think of it as the normal, default case. The Bourne shell is one of the few exceptions; it's dynamically scoped, and so all of the $i's are the same variable and my various usages of $i were stepping on each other. Since it's the rare exception and I don't do complicated Bourne shell programming very often, I completely forgot this difference when I wrote my script. Hopefully I'll now remember it for the next time I write something in the Bourne shell that's sufficiently complicated that it uses functions and multiple for loops.

(Awk is another language that I deal with that normally has dynamic scope, but I've only ever written a few pieces of awk that were complicated enough to use functions (such as this one).)

BourneShellGlobalVariableOops written at 01:25:37; Add Comment

2017-06-16

Go interfaces and automatically generated functions

I recently read Golang Internals Part 1: Autogenerated functions (and how to get rid of them) (via, and also), which recounts how Minio noticed an autogenerated function in their stack traces that was making an on-stack copy of a structure before calling a method, and worked out how to eliminate this function. Unfortunately, Minio's diagnosis of why this autogenerated function exists at all is not correct (although their solution is the right one). This matters partly because the reason why this autogenerated function exists exposes a real issue you may want to think about in your Go API design.

Let's start at the basics, which in this case is Go methods. Methods have a receiver, and this receiver can either be a value or a pointer. Your choice here of whether your methods have value receivers or pointer receivers matters for your API; see, for example, this article (via). Types also have a method set, which is simply all of the methods that they have. However, there is a special rule for the method sets of pointer types:

The method set of the corresponding pointer type *T is the set of all methods declared with receiver *T or T (that is, it also contains the method set of T).

(The corollary of this is that every regular type T implicitly creates a pointer type *T with all of T's methods, even if you never mention *T in your code or explicitly define any methods for it.)

It's easy to see how this works. Given a *T, you can always call a method on T by simply dereferencing your *T to get a T value, which means that it's trivial to write out a bunch of *T methods that just call the corresponding T methods:

func (p *T) Something(...) (...) {
  v := *p
  return v.Something(...)
}

Rather than require you to go through the effort of hand-writing all of these methods for all of your *T types, Go auto-generates them for you as necessary. This is exactly the autogenerated function that Minio saw in their stack traces; the underlying real method was cmd.retryStorage.ListDir() (which has a value receiver) and the autogenerated function was cmd.(*retryStorage).ListDir() (which has a pointer receiver, and which did the same dereferencing as our Something example).

But, you might ask, where does the *retryStorage pointer come from? The answer is that it comes from using interface types and values instead of concrete types and values. Here is the relevant bits of the cleanupDir() function that was one step up Minio's stack trace:

func cleanupDir(storage StorageAPI, volume, dirPath string) error {
  [...]
     entries, err := storage.ListDir(volume, entryPath)
  [...]
}

We're making a ListDir() method call on storage, which is of type StorageAPI. This is an interface type, and therefor storage is an interface value. As Russ Cox has covered in his famous article Go Data Structures: Interfaces, interface values are effectively two-pointer structures:

Interface values are represented as a two-word pair giving a pointer to information about the type stored in the interface and a pointer to the associated data.

When we create a StorageAPI interface value from an underlying retryStorage object, the interface value contains a pointer to the object, not the object itself. When we call a function that takes such an interface value as one of its arguments, we wind up passing it a *retryStorage pointer (among other things). As a result, when we call cleanupDir(), we're effectively creating a situation in the code like this:

type magicI struct {
  tab *_typeDef
  ptr *retryStorage
}

func cleanupDir(storage magicI, ...) error {
  [...]
    // we're trying to call (*retryStorage).ListDir()
    // since what we have is a pointer, not a value.
    entries, err := storage.ptr.ListDir(...)
  [...]
}

Since there is no explicit pointer receiver method (*retryStorage).ListDir() but there is a value receiver method retryStorage.ListDir(), Go calls the autogenerated (*retryStorage).ListDir() method for us (well, for Minio).

This points out an important general rule: calling value receiver methods through interfaces always creates extra copies of your values. Interface values are fundamentally pointers, while your value receiver methods require values; ergo every call requires Go to create a new copy of the value, call your method with it, and then throw the value away. There is no way to avoid this as long as you use value receiver methods and call them through interface values; it's a fundamental requirement of Go.

The conclusion for API design is clear but not necessarily elegant. If your type's methods will always or almost always be called through interface values, you might want to consider using pointer receiver methods instead of value receiver methods even if it's a bit unnatural. Using pointer receiver methods avoids both making a new copy of the value and doing an additional call through the autogenerated conversion method; you go straight to your actual method with no overhead. For obvious reasons, the larger your values are (in terms of the storage they require), the more this matters, because Go has to copy more and more bytes around to create that throwaway value for the method call.

(Of course if you have large types you probably don't want value receiver methods in the first place, regardless of whether or not they wind up being called through interface values. Value receiver methods are best for values that only take up modest amounts of storage, or at least that can be copied around that way.)

Sidebar: How Go passes arguments to functions at the assembly level

In some languages and runtime environments, if you call a function that takes a sufficiently large value as an argument (for example, a large structure), the argument is secretly passed by providing the called function with a pointer to data stored elsewhere instead of writing however many bytes into the stack. Large return values may similarly be returned indirectly (often into a caller-prepared area). At least today, Go is not such a language. All arguments are passed completely on the stack, even if they are large.

This means that Go must always dereference *T pointers into on-stack copies of the value in order to call value receiver T methods. Those T methods fundamentally require their arguments to be on the stack, nowhere else, and this includes the receiver itself (which is passed as a more or less hidden first argument, and things get complicated here).

GoInterfacesAutogenFuncs written at 23:48:43; Add Comment

2017-06-06

A humbling experience of misreading some simple (Go) code

Every so often, I get to have a humbling experience, sometimes in public and sometimes in private. Recently I was reading Go Range Loop Internals (via) and hit its link to this Damian Gryski (@dgryski) tweet:

Today's #golang gotcha: the two-value range over an array does a copy. Avoid by ranging over the pointer instead.

play.golang.org/p/4b181zkB1O

I ran the code on the playground, followed it along, and hit a 'what?' moment where I felt I had a mystery where I didn't understand why Go was doing something. Here is the code:

func IndexValueArrayPtr() {
  a := [...]int{1, 2, 3, 4, 5, 6, 7, 8}

  for i, v := range &a {
    a[3] = 100
    if i == 3 {
      fmt.Println("IndexValueArrayPtr", i, v)
    }
  }
}

Usefully, I have notes about my confusion, and I will put them here verbatim:

why is the IndexValueArrayPtr result '3 100'? v should be copied before a[3] is modified, and v is type 'int', not a pointer.

This is a case of me reading the code that I thought was there instead of the code that was actually present, because I thought the code was there to make a somewhat different point. What I had overlooked in IndexValueArrayPtr (and in fact in all three functions) is that a[3] is set on every pass through the loop, not just when i == 3.

Misreading the code this way makes no difference to the other two examples (you can see this yourself with this variant), but it's crucial to how IndexValueArrayPtr behaves. If the a[3] assignment was inside the if, my notes would be completely true; v would have copied the old value of a[3] before the assignment and this would print '3 4'. But since the assignment happens on every pass of the loop, a[3] has already been assigned to be 100 by the time the loop gets to the fourth element and makes v a copy of it.

(I think I misread the code this way partly because setting a[3] only once is more efficient and minimal, and as noted the other two functions still illustrate their particular issues when you do it that way.)

Reading an imaginary, 'idealized' version of the code instead of the real one is not a new thing and it's not unique to me, of course. When you do it on real code in a situation where you're trying to find a bug, it can lead to a completely frustrating time where you literally can't see what's in front of your eyes and then when you can you wonder how you could possibly have missed it for so long.

(I suspect that this is a situation where rubber duck debugging helps. When you have to actually say things out loud, you hopefully get another chance to have your brain notice that what you want to say doesn't actually correspond to reality.)

PS: The reason I have notes on my confusion is that I was planning to turn explaining this 'mystery' into a blog entry. Then, well, I worked out the mystery, so now I've gotten to write a somewhat different sort of blog entry on it.

GoMisreadingSomeCode written at 23:59:29; Add Comment

2017-05-31

Why one git fetch default configuration bit is probably okay

I've recently been reading the git fetch manpage reasonably carefully as part of trying to understand what I'm doing with limited fetches. If you do this, you'll run across an interesting piece of information about the <refspec> argument, including in its form as the fetch = setting for remotes. The basic syntax is '<src>:<dst>', and the standard version that is created by any git clone gives you:

fetch = +refs/heads/*:refs/remotes/origin/*

You might wonder about that + at the start, and I certainly did. Well, it's special magic. To quote the documentation:

The remote ref that matches <src> is fetched, and if <dst> is not empty string, the local ref that matches it is fast-forwarded using <src>. If the optional plus + is used, the local ref is updated even if it does not result in a fast-forward update.

(Emphasis mine.)

When I read this my eyebrows went up, because it sounded dangerous. There's certainly lots of complicated processes around 'git pull' if it detects that it can't fast-forward what it's just fetched, so allowing non-fast-forward fetches (and by default) certainly sounded like maybe it was something I wanted to turn off. So I tried to think carefully about what's going on here, and as a result I now believe that this configuration is mostly harmless and probably what you want.

The big thing is that this is not about what happens with your local branch, eg master or rel-1.8. This is about your repo's copy of the remote branch, for example origin/master or origin/rel-1.8. And it is not even about the branch, because branches are really 'refs', symbolic references to specific commits. git fetch maintains refs (here under refs/remotes/origin) for every branch that you're copying from the remote, and one of the things that it does when you fetch updates is update these refs. This lets the rest of Git use them and do things like merge or fast-forward remote updates into your local remote-tracking branch.

So git fetch's documentation is talking about what it does to these remote-branch refs if the branch on the remote has been rebased or rewound so that it is no longer a more recent version of what you have from your last update of the remote. With the + included in the <refspec>, git fetch always updates your repo's ref for the remote branch to match whatever the remote has; basically it overwrites whatever ref you used to have with the new ref from the remote. After a fetch, your origin/master or origin/rel-1.8 will always be the same as the remote's, even if the remote rebased, rewound, or did other weird things. You can then go on to fix up your local branch in a variety of ways.

(To be technical your origin/master will be the same as origin's master, but you get the idea here.)

This makes the + a reasonable default, because it means that 'git fetch' will reliably mirror even a remote that is rebasing and otherwise having its history rewritten and its branches changed around. Without the +, 'git fetch' might transfer the new and revised commits and trees from your remote but it wouldn't give you any convenient reference for them for you to look at them, integrate them, or just reset your local remote-tracking branch to their new state.

(Without the '+', 'git fetch' won't update your repo's remote-branch refs. I don't know if it writes the new ref information anywhere, perhaps to .git/FETCH_HEAD, or if it just throws it away, possibly after printing out commit hashes.)

Sidebar: When I can imagine not using a '+'

The one thing that using a '+' does is that it sort of allows a remote to effectively delete past history out of your local repo, something that's not normally possible in a DVCS and potentially not desirable. It doesn't do this directly, but it starts an indirect process of it and it certainly makes the old history somewhat annoying to get at.

Git doesn't let a remote directly delete commits, trees, and objects. But unreferenced items in your repo are slowly garbage-collected after a while and when you update your remote-branch refs after a non-ff fetch, the old commits that the pre-fetch refs pointed to start becoming more and more unreachable. I believe they live on in the reflog for a while, but you have to know that they're missing and to look.

If you want to be absolutely sure that you notice any funny business going on in an upstream remote that is not supposed to modify its public history this way, not using '+' will probably help. I'm not sure if it's the easiest way to do this, though, because I don't know what 'git fetch' does when it detects a non-ff fetch like this.

(Hopefully git fetch complains loudly instead of failing silently.)

GitFetchMagicPlus written at 00:44:13; Add Comment

2017-05-29

Configuring Git worktrees to limit what's fetched on pulls

Yesterday I wrote about my practical problem with git worktrees, which is to limit what is fetched from the remote when I do 'git pull' in one (as opposed to the main repo). I also included a sidebar with a theory on how to do this with some Git configuration madness. In a spirit of crazed experimentation I've now put this theory into practice and it appears to actually work. Unfortunately the way I know how to do this requires some hand editing of your .git/config, rather than using commands like 'git remote' to do this for you. However, I don't fully understand what I'm doing here (and that's one reason I'm putting in lots of notes to myself).

Here's my process:

  1. Create a new worktree as normal, based from the origin branch you want:

    git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8
    

    Because we used -b, this will also create a local remote-tracking branch, release-branch.go1.8, that tracks origin's release-branch.go1.8 branch.

    If you already have a release-branch.go1.8 branch (perhaps you've checked it out in your main repo at some point or previously created a worktree for it), this is just:

    git worktree add ../v1.8 release-branch.go1.8
    

  2. Create a new remote for your upstream repo to fetch just this upstream branch:

    git remote add -t release-branch.go1.8 origin-v1.8 https://go.googlesource.com/go
    

    Because we set it up to track only a specific remote branch, 'git fetch' for this remote will only fetch updates for the remote's release-branch.go1.8 branch, even though it has the same URL as our regular origin remote (which will normally fetch all branches).

  3. Edit .git/config to change the fetch = line for origin-v1.8 to fetch the branch into refs/remotes/origin/release-branch.go1.8, which is the fetch destination for your origin remote. That is:

    fetch = +refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8
    

    By fetching into refs/remotes/origin like this, my understanding is that we avoid doing duplicate fetches. Whether we do 'git fetch' in our worktree or in the maste repo, we'll be updating the same remote branch reference and so we'll only fetch updates for this (remote) branch once. I believe that if you don't do this, 'git pull' or 'git fetch' in the worktree will always report the new updates; you'll never 'lose' an update for the branch by doing a 'git pull' in the master. However I think you may wind up doing extra transfers.

    (This can be done with git config but I'd rather edit .git/config by hand.)

  4. Edit .git/config again to change the 'remote =' line for your release-branch.go1.8 branch to be origin-v1.8 instead of origin.

    By forcing the remote for the branch, we activate git fetch's restriction on what remote branches will be fetched when we do a 'git pull' or 'git fetch' in a tree with that branch checked out (here, our worktree, but it could be the master repo).

    If you prefer, you can set this with 'git config' instead of by hand editing:

    git config branch.release-branch.go1.8.remote origin-v1.8
    

We can see that this works by comparing 'git fetch -v --dry-run' in the worktree and in the master repo. In the worktree, it will report just an attempt to update origin/release-branch.go1.8. In the master repo, it will (normally) report an attempt to update everything.

Because everything is attached to our branch configuration for the (local) release-branch.go1.8 branch, not the worktree, this will survive removing and then re-recreating the worktree. This may be a feature, or it may be a drawback, since it means that if you delete the worktree and check out release-branch.go1.8 in the master repo, 'git pull' will start only updating it (and not updating master and other branches as well). We can change back to the normal state of things by updating the remote for the branch back to the normal origin remote:

git config branch.release-branch.go1.8.remote origin

(In general you can flip the state of the branch back and forth as you want. I don't think Git gets confused, although you may.)

GitWorktreeLimitedPulling written at 22:42:43; Add Comment

2017-05-28

My thoughts on git worktrees for me (and some notes on things I tried)

I recently discovered git worktrees and did some experimentation with using them for stuff that I do. The short summary of my experience so far is that while I can see the appeal for certain sorts of usage cases, I don't think git worktrees are a good fit for my situation and I'm probably to use completely independent repositories in the future.

My usage case was building my own copies of multiple versions of some project, starting with Go. Especially in the case of a language compiler and its standard library, it's reasonably useful to have the latest development version plus a stable version or two; for example, it gives me an easy way to test if something I'm working on will build on older released versions or if I've let a dependency on some recent bit of the standard library creep in. The initial process of creating a worktree for, say, Go 1.8 is reasonably straightforward:

cd /some/where/go
git worktree add -b release-branch.go1.8 ../v1.8 origin/release-branch.go1.8

What proved tricky for me is updating this v1.8 tree when the Go people update Go 1.8, as they do periodically. My normal way of staying up to date on what changes are happening in the main line of Go is to do 'git pull' in my master repo directory, note the lines that get printed out about fetched updates, eg:

remote: Finding sources: 100% (64/64)
remote: Total 64 (delta 23), reused 64 (delta 23)
Unpacking objects: 100% (64/64), done.
From https://go.googlesource.com/go
   ffab6ab877..d64c49098c  master     -> origin/master

And then I use 'git log ffab6ab877..d64c49098c' to see what changed. The problem with worktrees is that this information is printed by 'git fetch', and normally 'git fetch' updates all branches, both the mainline and, say, a release branch you're following. So I actively don't want to run 'git pull' or 'git fetch' in the worktree directory, because otherwise I will have to remember to stop and look at the mainline updates it's just fetched and reported to me.

What I wound up doing was running 'git pull' in my main go tree and if there was an update to origin/release-branch.go1.8 reported, I'd go to my 'v1.8' directory and do 'git merge --ff-only'. This mostly worked (it blew up on me once for reasons I don't understand), but it means that dealing with a worktree is different than dealing with a normal Git repo directory (including an independently cloned repo). Since 'git pull' and other Git commands work 'normally' in a worktree, I have to explicitly remember that I created something as a worktree (or check to see if .git is a directory to know, since 'git status' doesn't helpfully tell you one way or the other).

(In my current moderate level of Git knowledge and experience, I'm going to avoid writing about the good usage cases I think I see for worktrees. Anyway, one of them is documented in the git-worktree manpage; I note that their scenario uses a worktree for a one-shot branch that's never updated from upstream.)

As mentioned, if I want to see if a particular Git repo is a worktree or not I need to do 'ls -ld .git'. If it's a file, I have a worktree. If I have a directory, with how I currently use Git, it's a full repo. 'git worktree list' will list the main repo and worktrees, but it doesn't annotate things with a 'you are here' marker. Obviously if I used worktrees enough I could write a status command to tell me, but then if I was doing that I could probably write a bunch of commands to do what I want in general.

Sidebar: Excessively clever Git configuration hacking (maybe)

Bearing in mind that I don't understand Git as much as I think I may, as far as I can see what branches 'git fetch' fetches are determined from the configuration for the remote for a branch, not from the branch's configuration. There appear to be two options for fiddling things here.

The 'obvious' option is to create a second remote (call it, say, 'v1.8-origin') with the same url as origin but a fetch setting that only fetches the particular branch:

fetch = refs/heads/release-branch.go1.8:refs/remotes/origin/release-branch.go1.8

Then I'd switch the remote for the release-branch.go1.8 branch to this new remote.

Git-fetch also has a feature where you can have a per-branch configuration in $GIT_DIR/branches/<branch>; this can be used to name the upstream 'head' (branch) that will be fetched into the local branch. It appears that creating such a file should do the trick, but I can't find people writing about this on the Internet (just many copies of the git-fetch manpage), so I'm wary of assuming that I understand what's going to happen here. Plus, it's apparently a deprecated legacy approach.

(If I understand all of this correctly, either approach would preserve 'git pull' in the main repo (which is on the master branch) always fetching all branches from upstream.)

GitWorktreeThoughts written at 23:08:19; Add Comment

(Previous 10 or go back to May 2017 at 2017/05/12)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.