2015-07-29
My workflow for testing Github pull requests
Every so often a Github-based project I'm following has a pending pull
request that might solve a bug or otherwise deal with something I care
about, and it needs some testing by people like me. The simple case is
when I am not carrying any local changes; it is adequately covered by
part of Github's Checking out pull requests locally
(skip to the bit where they talk about 'git fetch'). A more elaborate
version is:
git fetch origin pull/<ID>/head:origin/pr/<ID> git checkout pr/<ID>
That creates a proper remote branch and then a local branch that tracks it, so I can add any local changes to the PR that I turn out to need and then keep track of them relative to the upstream pull request. If the upstream PR is rebased, well, I assume I get to delete my remote and then re-fetch it and probably do other magic. I'll cross that bridge when I reach it.
The not so simple case is when I am carrying local changes on top of the upstream master. In the fully elaborate case I actually have two repos, the first being a pure upstream tracker and the second being a 'build' repo that pulls from the first repo and carries my local changes. I need to apply some of my local changes on top of the pull request while skipping others (in this case, because some of them are workarounds for the problem the pull request is supposed to solve), and I want to do all of this work on a branch so that I can cleanly revert back to 'all of my changes on top of the real upstream master'.
The workflow I've cobbled together for this is:
- Add the Github master repo if I haven't already done so:
git remote add github https://github.com/zfsonlinux/zfs.git - Edit
.git/configto add a new 'fetch =' line so that we can also fetch pull requests from thegithubremote, where they will get mapped to the remote branchesgithub/pr/NNN. This will look like:[remote "github"]
fetch = +refs/pull/*/head:refs/remotes/github/pr/*
[...](This comes from here.)
- Pull down all of the pull requests with '
git fetch github'.I think an alternate to configuring and fetching all pull requests is the limited version I did in the simple case (changing
origintogithubin both occurrences), but I haven't tested this. At the point that I have to do this complicated dance I'm in a 'swatting things with a hammer' mode, so pulling down all PRs seems perfectly fine. I may regret this later. - Create a branch from
masterthat will be where I build and test the pull request (plus my local changes):git checkout -b pr-NNNIt's vitally important that this branch start from
masterand thus already contain my local changes. - Do an interactive rebase relative to the upstream pull request:
git rebase -i github/pr/NNNThis incorporates the pull request's changes 'below' my local changes to master, and with
-iI can drop conflicting or unneeded local changes. Effectively it is much like what happens when you do a regular 'git pull --rebase' onmaster; the changes ingithub/pr/NNNare being treated as upstream changes and we're rebasing my local changes on top of them. - Set the upstream of the pr-NNN branch to the actual Github pull
request branch:
git branch -u github/pr/NNNThis makes '
git status' report things like 'Your branch is ahead of ... by X commits', where X is the number of local commits I've added.
If the pull request is refreshed, my current guess is that I will
have to fully discard my local pr-NNN branch and restart from
fetching the new PR and branching off master. I'll undoubtedly
find out at some point.
Initially I thought I should be able to use a sufficiently clever
invocation of 'git rebase' to copy some of my local commits from
master on to a new branch that was based on the Github pull
request. With work I could get the rebasing to work right; however,
it always wound up with me on (and changing) the master branch,
which is not what I wanted. Based on this very helpful page on
what 'git rebase' is really doing, what
I want is apparently impossible without explicitly making a new
branch first (and that new branch must already include my local
changes so they're what gets rebased, which is why we have to branch
from master).
This is probably not the optimal way to do this, but having hacked my way through today's git adventure game I'm going to stop now. Feel free to tell me how to improve this in comments.
(This is the kind of thing I write down partly to understand it and partly because I would hate to have to derive it again, and I'm sure I'll need it in the future.)
Sidebar: Why I use two repos in the elaborate case
In the complex case I want to both monitor changes in the Github
master repo and have strong control over what I incorporate into
my builds. My approach is to routinely do 'git pull' in the pure
tracking repo and read 'git log' for new changes. When it's time
to actually build, I 'git pull' (with rebasing) from the tracking repo into the build
repo and then proceed. Since I'm pulling from the tracking repo,
not the upstream, I know exactly what changes I'm going to get in
my build repo and I'll never be surprised by a just-added upstream
change.
In theory I'm sure I could do this in a single repo with various tricks, but doing it in two repos is much easier for me to keep straight and reliable.
2015-07-07
The Git 'commit local changes and rebase' experience is a winning one
I mentioned recently that I'd been persuaded to change my ways from leaving local changes uncommitted in my working repos to committing them and rebasing on pulls. When I started this, I didn't expect it to be any real change from the experience of pulling with uncommitted changes and maybe stashing them every so often and so on; I'd just be doing things the proper and 'right' Git way (as everyone told me) instead of the sloppy way.
I was wrong. Oh, certainly the usual experience is the same; I do a
'git pull', I get my normal pull messages and stats output, and
Git adds a couple of lines at the end about automatically rebasing
things. But with local commits and rebasing, dealing with conflicts
after a pull is much better. This isn't because I have fewer or
simpler changes to merge, it's simply because the actual user interface
and process is significantly nicer. There's very little fuss and muss;
I fire up my editor on a file or two, I look for the '<<<<' markers, I
sort things out, I can get relatively readable diffs, and then I can
move on smoothly.
(And the messages from git during rebasing are actually quite helpful.)
Re-applying git stashes that had conflicts with the newly pulled
code was not as easy or as smooth, at least for the cases that I
dealt with. My memory is that it was harder to see my changes and
harder to integrate them, and also sometimes I had to un-add things
from the index that git stash had apparently automatically added
for me. I felt far less in control of the whole process than I do
now with rebasing.
(And with rebasing, the git reflog means that if I need to I can revert my repo to the pre-pull state and see exactly how things were organized in the old code and what the code did with my changes integrated. Sometimes this is vital if there's been a significant restructuring of upstream code. In the past with git stash, I've been lucky because I had an intact pre-pull copy of the repo (with my changes) on a second machine.)
I went into this expecting to be neutral on the change to 'commit and rebase on pulls'. I've now wound up quite positive on it; I actively like and prefer to be fixing up a rebase to fixing up a git stash. Rebasing really is better, even if I just have a single small and isolated change.
(And thank you to the people who patiently pushed me towards this.)
2015-07-03
Some notes on my 'commit local changes and rebase' Git workflow
A month or so ago I wrote about how I don't commit changes in my working repos and in reaction to it several people argued that I ought to change my way. Well, never let it be said that I can't eventually be persuaded to change my ways, so since then I've been cautiously moving to committing my changes and rebasing on pulls in a couple of Git repos. I think I like it, so I'm probably going to make it my standard way of working with Git in the future.
The Git configuration settings I'm using are:
git config pull.rebase true git config rebase.stat true
The first just makes 'git pull' be 'git pull --rebase'. If I
wind up working with multiple branches in repos, I may need to set
this on a per-branch basis or something; so far I just track
origin/master so it works for me. The second preserves the normal
'git pull' behavior of showing a summary of updates, which I find
useful for keeping an eye on things.
One drawback of doing things this way is that 'git pull' will now
abort if there are also uncommitted changes in the repo, such as I
might have for a very temporary hack or test. I need to remember
to either commit such changes or do 'git stash' before I pull.
(The other lesson here is that I need to learn how to manipulate rebase commits so I can alter, amend, or drop some of them.)
Since I've already done this once: if I have committed changes in
a repo without this set, and use 'git pull' instead of 'git pull
--rebase', one way to abort the resulting unwanted merge is 'git
reset --hard HEAD'. Some sources suggest 'git reset --merge' or
'git merge --abort' instead. But really I should set pull rebasing
to on the moment I commit my own changes to a repo.
(There are a few repos around here that now need this change.)
I haven't had to do a bisection on a
commit-and-rebase repo yet, but I suspect that bisection won't go
well if I actually need my changes in all versions of the repo that
I build and test. If I wind up in this situation I will probably
temporarily switch to uncommitted changes and use of 'git stash',
probably in a scratch clone of the upstream master repo.
(In general I like cloning repos to keep various bits of fiddling around in them completely separate. Sure, I probably could mix various activities in one repo without having things get messed up, but a different directory hierarchy that I delete afterwards is the ultimate isolation and it's generally cheap.)
2015-07-02
Some thoughts on Go compiler directives being in source comments
Recently, I've been reading some commotion about how Go compiler directives being in source code comments is, well, not the 'elegant design' that Go's creators may feel it is. As it happens I have some sympathies for Go here, so let's talk about what I see as the issues involved.
First, let's differentiate between what I'll arbitrarily call 'broad'
and 'narrow' compiler directives. In a nutshell, what I'm calling
a broad compiler directive is something that changes the meaning
of the source code such that every compiler implementation must
handle it. In C, #include and #define are broad directives.
Broad directives are effectively part of the language and as such
I feel that they deserve first class support as an explicit element
in language syntax.
(Broad directives don't have to use a new language syntax element.
Python's 'from __future__ import ...' is such a broad directive,
but it uses a standard language element.)
By contrast, narrow directives only apply to a specific compiler or tool. Since they're only for a specific program they should be namespaced, ie you need some way of saying 'this uninterpreted blob of text is only for <X>' so that other compilers can ignore it. This requires either a specific element of language syntax to say 'this following text is only for <X>' or hijacking a portion of some existing syntax where you can add arbitrary namespaced text. The easiest existing syntax to hijack is comments.
Since narrow directives do not change the language itself (at least in theory), it seems at least a bit odd to give them an explicit syntax element. In effect you're creating another escape hatch for language-meaningless text that sits alongside comments; one is sort of for people (although it may be interpreted by tools, for example for documentation) and one is a slightly structured one for tools.
(If a narrow directive changes the semantics of the code being compiled, it's actually changing the language the compiler is dealing with from 'language <X>' to 'something similar to <X> but not quite it'. Problems often ensue here in the long run.)
As far as I know, all of the existing Go compiler directives are
narrow directives. They're either used by specific non-compiler
tools or they're internal directives for one specific Go compiler
(admittedly the main 'go' compiler). As far as I'm concerned this
makes them pretty much fair game to be implemented without a specific
element of language syntax. Other people may disagree and feel that
even narrow directives should have some sort of specific language
syntax support.
PS: There may well be standard terminology in the programming language community for what I'm calling broad versus narrow directives here.
(This elaborates on some tweets I made, because Twitter forces condensed and sometimes opaque writing.)
Sidebar: The problem with non-namespaced narrow directives
If you don't namespace your narrow directives you wind up with the
C #pragma problem, which is 'what do you do when you encounter a
#pragma that you don't recognize?'. If you do error out, you cause
problems for people who are using you to compile source code with
#pragmas for some other compiler. If you don't error out, you
cause problems for people who've accidentally misspelled one of
your #pragmas and are now having it be more or less silently
ignored.
(You can try to know about the #pragmas of all other compilers,
but in practice you're never going to know absolutely all of them.)