Wandering Thoughts archives

2015-07-29

My workflow for testing Github pull requests

Every so often a Github-based project I'm following has a pending pull request that might solve a bug or otherwise deal with something I care about, and it needs some testing by people like me. The simple case is when I am not carrying any local changes; it is adequately covered by part of Github's Checking out pull requests locally (skip to the bit where they talk about 'git fetch'). A more elaborate version is:

git fetch origin pull/<ID>/head:origin/pr/<ID>
git checkout pr/<ID>

That creates a proper remote branch and then a local branch that tracks it, so I can add any local changes to the PR that I turn out to need and then keep track of them relative to the upstream pull request. If the upstream PR is rebased, well, I assume I get to delete my remote and then re-fetch it and probably do other magic. I'll cross that bridge when I reach it.

The not so simple case is when I am carrying local changes on top of the upstream master. In the fully elaborate case I actually have two repos, the first being a pure upstream tracker and the second being a 'build' repo that pulls from the first repo and carries my local changes. I need to apply some of my local changes on top of the pull request while skipping others (in this case, because some of them are workarounds for the problem the pull request is supposed to solve), and I want to do all of this work on a branch so that I can cleanly revert back to 'all of my changes on top of the real upstream master'.

The workflow I've cobbled together for this is:

  • Add the Github master repo if I haven't already done so:
    git remote add github https://github.com/zfsonlinux/zfs.git

  • Edit .git/config to add a new 'fetch =' line so that we can also fetch pull requests from the github remote, where they will get mapped to the remote branches github/pr/NNN. This will look like:
    [remote "github"]
       fetch = +refs/pull/*/head:refs/remotes/github/pr/*
       [...]

    (This comes from here.)

  • Pull down all of the pull requests with 'git fetch github'.

    I think an alternate to configuring and fetching all pull requests is the limited version I did in the simple case (changing origin to github in both occurrences), but I haven't tested this. At the point that I have to do this complicated dance I'm in a 'swatting things with a hammer' mode, so pulling down all PRs seems perfectly fine. I may regret this later.

  • Create a branch from master that will be where I build and test the pull request (plus my local changes):
    git checkout -b pr-NNN

    It's vitally important that this branch start from master and thus already contain my local changes.

  • Do an interactive rebase relative to the upstream pull request:
    git rebase -i github/pr/NNN

    This incorporates the pull request's changes 'below' my local changes to master, and with -i I can drop conflicting or unneeded local changes. Effectively it is much like what happens when you do a regular 'git pull --rebase' on master; the changes in github/pr/NNN are being treated as upstream changes and we're rebasing my local changes on top of them.

  • Set the upstream of the pr-NNN branch to the actual Github pull request branch:
    git branch -u github/pr/NNN

    This makes 'git status' report things like 'Your branch is ahead of ... by X commits', where X is the number of local commits I've added.

If the pull request is refreshed, my current guess is that I will have to fully discard my local pr-NNN branch and restart from fetching the new PR and branching off master. I'll undoubtedly find out at some point.

Initially I thought I should be able to use a sufficiently clever invocation of 'git rebase' to copy some of my local commits from master on to a new branch that was based on the Github pull request. With work I could get the rebasing to work right; however, it always wound up with me on (and changing) the master branch, which is not what I wanted. Based on this very helpful page on what 'git rebase' is really doing, what I want is apparently impossible without explicitly making a new branch first (and that new branch must already include my local changes so they're what gets rebased, which is why we have to branch from master).

This is probably not the optimal way to do this, but having hacked my way through today's git adventure game I'm going to stop now. Feel free to tell me how to improve this in comments.

(This is the kind of thing I write down partly to understand it and partly because I would hate to have to derive it again, and I'm sure I'll need it in the future.)

Sidebar: Why I use two repos in the elaborate case

In the complex case I want to both monitor changes in the Github master repo and have strong control over what I incorporate into my builds. My approach is to routinely do 'git pull' in the pure tracking repo and read 'git log' for new changes. When it's time to actually build, I 'git pull' (with rebasing) from the tracking repo into the build repo and then proceed. Since I'm pulling from the tracking repo, not the upstream, I know exactly what changes I'm going to get in my build repo and I'll never be surprised by a just-added upstream change.

In theory I'm sure I could do this in a single repo with various tricks, but doing it in two repos is much easier for me to keep straight and reliable.

GithubPRTestingWorkflow written at 23:08:58; Add Comment

2015-07-07

The Git 'commit local changes and rebase' experience is a winning one

I mentioned recently that I'd been persuaded to change my ways from leaving local changes uncommitted in my working repos to committing them and rebasing on pulls. When I started this, I didn't expect it to be any real change from the experience of pulling with uncommitted changes and maybe stashing them every so often and so on; I'd just be doing things the proper and 'right' Git way (as everyone told me) instead of the sloppy way.

I was wrong. Oh, certainly the usual experience is the same; I do a 'git pull', I get my normal pull messages and stats output, and Git adds a couple of lines at the end about automatically rebasing things. But with local commits and rebasing, dealing with conflicts after a pull is much better. This isn't because I have fewer or simpler changes to merge, it's simply because the actual user interface and process is significantly nicer. There's very little fuss and muss; I fire up my editor on a file or two, I look for the '<<<<' markers, I sort things out, I can get relatively readable diffs, and then I can move on smoothly.

(And the messages from git during rebasing are actually quite helpful.)

Re-applying git stashes that had conflicts with the newly pulled code was not as easy or as smooth, at least for the cases that I dealt with. My memory is that it was harder to see my changes and harder to integrate them, and also sometimes I had to un-add things from the index that git stash had apparently automatically added for me. I felt far less in control of the whole process than I do now with rebasing.

(And with rebasing, the git reflog means that if I need to I can revert my repo to the pre-pull state and see exactly how things were organized in the old code and what the code did with my changes integrated. Sometimes this is vital if there's been a significant restructuring of upstream code. In the past with git stash, I've been lucky because I had an intact pre-pull copy of the repo (with my changes) on a second machine.)

I went into this expecting to be neutral on the change to 'commit and rebase on pulls'. I've now wound up quite positive on it; I actively like and prefer to be fixing up a rebase to fixing up a git stash. Rebasing really is better, even if I just have a single small and isolated change.

(And thank you to the people who patiently pushed me towards this.)

GitCommitAndRebaseBetter written at 02:07:44; Add Comment

2015-07-03

Some notes on my 'commit local changes and rebase' Git workflow

A month or so ago I wrote about how I don't commit changes in my working repos and in reaction to it several people argued that I ought to change my way. Well, never let it be said that I can't eventually be persuaded to change my ways, so since then I've been cautiously moving to committing my changes and rebasing on pulls in a couple of Git repos. I think I like it, so I'm probably going to make it my standard way of working with Git in the future.

The Git configuration settings I'm using are:

git config pull.rebase true
git config rebase.stat true

The first just makes 'git pull' be 'git pull --rebase'. If I wind up working with multiple branches in repos, I may need to set this on a per-branch basis or something; so far I just track origin/master so it works for me. The second preserves the normal 'git pull' behavior of showing a summary of updates, which I find useful for keeping an eye on things.

One drawback of doing things this way is that 'git pull' will now abort if there are also uncommitted changes in the repo, such as I might have for a very temporary hack or test. I need to remember to either commit such changes or do 'git stash' before I pull.

(The other lesson here is that I need to learn how to manipulate rebase commits so I can alter, amend, or drop some of them.)

Since I've already done this once: if I have committed changes in a repo without this set, and use 'git pull' instead of 'git pull --rebase', one way to abort the resulting unwanted merge is 'git reset --hard HEAD'. Some sources suggest 'git reset --merge' or 'git merge --abort' instead. But really I should set pull rebasing to on the moment I commit my own changes to a repo.

(There are a few repos around here that now need this change.)

I haven't had to do a bisection on a commit-and-rebase repo yet, but I suspect that bisection won't go well if I actually need my changes in all versions of the repo that I build and test. If I wind up in this situation I will probably temporarily switch to uncommitted changes and use of 'git stash', probably in a scratch clone of the upstream master repo.

(In general I like cloning repos to keep various bits of fiddling around in them completely separate. Sure, I probably could mix various activities in one repo without having things get messed up, but a different directory hierarchy that I delete afterwards is the ultimate isolation and it's generally cheap.)

GitCommitAndRebaseNotes written at 01:21:50; Add Comment

2015-07-02

Some thoughts on Go compiler directives being in source comments

Recently, I've been reading some commotion about how Go compiler directives being in source code comments is, well, not the 'elegant design' that Go's creators may feel it is. As it happens I have some sympathies for Go here, so let's talk about what I see as the issues involved.

First, let's differentiate between what I'll arbitrarily call 'broad' and 'narrow' compiler directives. In a nutshell, what I'm calling a broad compiler directive is something that changes the meaning of the source code such that every compiler implementation must handle it. In C, #include and #define are broad directives. Broad directives are effectively part of the language and as such I feel that they deserve first class support as an explicit element in language syntax.

(Broad directives don't have to use a new language syntax element. Python's 'from __future__ import ...' is such a broad directive, but it uses a standard language element.)

By contrast, narrow directives only apply to a specific compiler or tool. Since they're only for a specific program they should be namespaced, ie you need some way of saying 'this uninterpreted blob of text is only for <X>' so that other compilers can ignore it. This requires either a specific element of language syntax to say 'this following text is only for <X>' or hijacking a portion of some existing syntax where you can add arbitrary namespaced text. The easiest existing syntax to hijack is comments.

Since narrow directives do not change the language itself (at least in theory), it seems at least a bit odd to give them an explicit syntax element. In effect you're creating another escape hatch for language-meaningless text that sits alongside comments; one is sort of for people (although it may be interpreted by tools, for example for documentation) and one is a slightly structured one for tools.

(If a narrow directive changes the semantics of the code being compiled, it's actually changing the language the compiler is dealing with from 'language <X>' to 'something similar to <X> but not quite it'. Problems often ensue here in the long run.)

As far as I know, all of the existing Go compiler directives are narrow directives. They're either used by specific non-compiler tools or they're internal directives for one specific Go compiler (admittedly the main 'go' compiler). As far as I'm concerned this makes them pretty much fair game to be implemented without a specific element of language syntax. Other people may disagree and feel that even narrow directives should have some sort of specific language syntax support.

PS: There may well be standard terminology in the programming language community for what I'm calling broad versus narrow directives here.

(This elaborates on some tweets I made, because Twitter forces condensed and sometimes opaque writing.)

Sidebar: The problem with non-namespaced narrow directives

If you don't namespace your narrow directives you wind up with the C #pragma problem, which is 'what do you do when you encounter a #pragma that you don't recognize?'. If you do error out, you cause problems for people who are using you to compile source code with #pragmas for some other compiler. If you don't error out, you cause problems for people who've accidentally misspelled one of your #pragmas and are now having it be more or less silently ignored.

(You can try to know about the #pragmas of all other compilers, but in practice you're never going to know absolutely all of them.)

GoDirectivesThoughts written at 01:01:47; Add Comment

By day for July 2015: 2 3 7 29; before July; after July.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.