With git, it's useful to pick the right approach to your problem

August 31, 2017

One of the things about Git is that once you go past basic committing, it generally has any number of ways for you to do what you want done. As with programming languages, part of getting better at Git is learning to pick the right idiom to attack your problem with. I can't claim that I'm good at this, but I am getting more experience, and recently I had an interesting experience here.

I use Byron Rakitzis' version of rc as my shell, which these days can be found on Github. Well, as has come up before, I don't actually use (just) that official version. I have my own change (to add a built-in read command) and because it was there, I also use some completion improvements, which come from Bert Münnich's collection of interesting rc modifications.

(I originally tried out Bert Münnich's version to get an important improvement before it became an official change. This change also shows the awesome power of raising an issue even if you expect that it's hopeless, as well as exploring Github forks of a project you're interested in.)

Recently some improvements have landed in the main repo that Bert Münnich has not yet rebased his modifications on top of. The other day I decided that I should update my own version to pick up these changes, because it turned out that I wanted to rebuild it on Fedora 26 anyway (that's its own story). The obvious way to do this was a straight rebase on top of the main repo, so that's what I did first.

The end result worked, but getting there took a bunch of effort. Bert Münnich's modifications include things like changing the build system from GNU Autoconf to a simple Makefile based one, so there were a bunch of clashes and situations that I had to resolve by hand (and I wasn't entirely confident of my own changes, since I was modifying Münnich's modifications without fully understanding either them or the upstream changes they clashed with). It felt like I was both working too hard and creating a fragile edifice for myself, so at the end I took a step back and asked what I really wanted and if there was a simpler, better way to get it.

When I did this I realized that what I really wanted was the upstream with my addition plus only a few of Bert Münnich's modifications (I've become addicted to command completion). While I could have created this with more rebasing, there was a much simpler approach (partly enabled by a better understanding of Git remotes and so on):

  1. Create a new clone of the main repo.
  2. Add Münnich's repo and my previous everything-together local repo as additional remotes.
  3. git fetch from both new remotes in order to make all of their commits available locally.

  4. git cherry-pick my addition by its commit hash.

  5. git cherry-pick the modifications I wanted from Münnich's repo, which only amounted to a few of them. Again I did this commit by commit using the commit hash, rather than trying to do anything more sophisticated. One or two cherry-picks required minor adjustment; since I'd already had to deal with them during the rebase, it was easy to fix things up.

While having done the rebase helped me deal with the conflicts during cherry-picking, the cherry-picking still felt much easier. I could have arrived at the same place with an interactive rebase (which would have let me drop modifications I'd decided I either didn't want or didn't care about), but I think it would have felt more indirect and chancy. Cherry-picking more directly expressed my intentions; I wanted my change and then this, this, and this from another tree. Done.

(In both cases, the git repo I wind up with probably can't be used for further rebases against Münnich's repo, just for rebases against the main repo.)

Stepping back, thinking about what I wanted, and then finding the right mechanism to express this in Git worked out very well. When I switched from rebasing to cherry-picking, I went from feeling that I was fighting git to get what I wanted to feeling that I was directly and naturally expressing myself and git was doing just what I wanted. Of course the real trick here is having the Git knowledge and experience to realize what the good way is; had I not had some experience, I might not have been familiar enough with cherry-picking to reach for it here. And there are undoubtedly Git manipulations that I don't even know exist, so I'll never pick them as the right option.

(As a side note, this isn't really the copy versus move situation that I ran into before. Instead it's much more that I'm gluing together a completely new branch that happens to be made with bits and pieces from some other branches (and after I'm done the other branches aren't of direct interest to me).)


Comments on this page:

By Miksa at 2017-09-01 04:46:40:

Have you put your modifications available in Github?

This is a situation where git just doesn't work all that well, and a traditional patch-based model is more appropriate. Git's concept of commit identity is too strong in this case. Suppose you have two local changes, A and B, to some upstream codebase. Git cares exactly which upstream commit those changes apply to, and what in order they're applied regardless of any actual dependency between those changes.

In the traditional model, A and B are patch files. There's probably a large range of upstream commits to which they cleanly apply. If they're independent, then it also doesn't matter what order they're in. So rather than track your local modifications as commits that have to be merged or rebased every time the upstream changes, you should just keep them as patch files that sit loosely on top of the latest upstream commit. Every time up upgrade, you grab the latest upstream snapshot, apply your changes, and build. If the patch doesn't work anymore, you update the patch (which you can track in git as a patch repository).

This is exactly what tools like Quilt and StGit are all about. This is also the reason Debian still today uses patch files to track their distribution changes to upstream codebases.

To me this doesn’t feel like it’s about picking the right Git idiom so much as picking the right SCM operation/primitive. And to my mind, this is implicitly the point of skeeto’s comment too. Note too how your own insights are phrased at a higher level of abstraction than Git idioms as well. E.g.:

I realized that what I really wanted was the upstream with my addition plus only a few of Bert Münnich's modifications

Written on 31 August 2017.
« People probably aren't going to tell you when your anti-spam systems are working
Putting cron jobs into systemd user slices »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Thu Aug 31 22:25:52 2017
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.