2023-03-27
Moving from 'master' to 'main' in Git with local changes
One of the things that various open source Git repositories are doing
is changing their main branch from being called 'master' to being
called 'main'. As a consumer of their repository, this is generally an
easy switch for me to deal with; some day, I will do a 'git pull
',
get a report that there's a new 'main' branch but there's no upstream
'master', and then I'll do 'git checkout main
' and I'm all good.
However, with some repositories I have my own local changes, which
I handle through Git rebasing. Recently
I had to go through a 'master' to 'main' switch on such a repository,
so I'm writing down what I did for later use.
The short version is:
git checkout main git cherry-pick origin/master..master
(This is similar to something I did before with Darktable.)
In general I could have done this with either 'git rebase
' or
'git cherry-pick
', and in theory according to my old take on
rebasing versus cherry-picking the 'proper'
answer might have been a rebase, since I was moving my local commits
onto the new 'main' branch. However it was clear to me that I would
probably have wanted to use the full three-argument form of 'git
rebase', which is at least somewhat tricky
to understand and to be sure I was doing right. Cherry-picking was
much simpler; I could easily reason about what it was doing, and
it left my old 'master
' state alone in case.
(Switching from rebasing to cherry-picking is an experience I've had before.)
Now that I've written this I've realized that there was probably a
third way, because at a mechanical level branches in git don't
entirely exist. The upstream 'master' and 'main' branches cover the
same commits (up until possibly the 'main' branch adds some on top).
The only thing that says my local changes are on 'master' instead
of 'main' is a branch head. In theory, what I could have done was
just relabeling my current state as being on 'main' instead of
'master', and then possibly a 'git pull
' to be current with the
new 'main'.
(In the case of this particular repository, it was only a renaming of the main branch; upstream, both the old 'master' and the new 'main' are on the same commit.)
Since I just tried it on a copy of my local repository in question, the commands to do this are:
git branch --force main master git checkout main # get up to date: git pull
I believe that you only need the pull if the upstream main is ahead of the old upstream master.
This feels more magical than the rebase or cherry-pick version, so I'm probably not likely to use it in the future unless there's some oddity about the situation. One potential reason would be if I've published my repository, I don't expect upstream development (just the main branch being renamed), and other people might have changes on top of my changes. At that point, a cherry-pick (or a rebase) would change the commit hashes of my changes, while simply sticking the 'main' branch label on to them doesn't, so people who have changes on top of my changes might have an easier time.
2023-03-05
An unexciting idea: Code changes have context
I recently read Mark Dominus's I wish people would stop insisting that Git branches are nothing but refs (via). One of my thoughts afterward is that this feels like an instance of a broader thing, which is that (code) changes have context; here, one part of that context is where they happen (ie, what branch they happen on). Of course we already know that in a sense, because Git (and pretty much every other version control system) considers it important to record both who made the change and when it was made.
In a way, it is turtles all of the way down. It's not too wrong to say that in Git, the core objects are trees. Changes (commits) are a record of the relationship between trees; they give you the context of moving from one tree to another (often partially literally in the form of the commit message and anything it points you to). Our desire for this context is one reason people emphasize that you should write good commit messages. In a sense, diffs themselves are an expression of that context, since they are literally what changed between the two (or more) trees involved (although diffs by themselves aren't necessarily enough context).
When we move one more level up, branches are one expression of the context of changes (commits) themselves. Branches generally have some sort of meaning, and they also represent (are) separate sequences of changes; that separation adds context to the changes themselves, although for more context you need to know what the branches are. Of course branches aren't the only way of adding context to changes (there are many ways of putting it into commit messages). Nor are they the only context to changes we care about, since sometimes we care if particular changes are in a release or in a version that someone is running.
(The question of 'has this change been merged into the main branch' is an interesting edge case. Here, we do care about the state of a change in the context of a branch, but it's not the branch the change was initially created in. Knowing that the whole branch was merged into the main branch would only be helpful if you knew that the branch didn't continue on beyond that merge.)
A corollary to this is that you'll forget this context over time. This makes me feel that it's worth putting as much of it as possible in a durable and accessible form, which probably means the commit message (since that's often the most accessible place). Code comments can help, but they're only attached to the new state so it may take some contortions to discuss the change. I've sometimes engaged in this when I think it's important enough (or where I may not think to find and look back at a commit message), but putting dates and discussions of how the old state used to be in comments feels somewhat wrong.
(I suspect that all of this is obvious, but Mark Dominus's article crystalized this in my mind so I feel like writing it down.)