Wandering Thoughts archives

2019-02-18

When cloning git repos, things go faster if you start from a good base

Normally I get a copy of any git repo that I want by directly cloning from its upstream, whatever that is. Generally this goes fast enough and it guarantees that I have an exact copy of the repo, one that's not contaminated by anything else that I might have been doing to any other copy of the repo that I might already have. But generally I'm doing it on a 1G network link. Recently I needed a copy of Guenter Roeck's tree of hwmon updates for the Linux kernel on my home machine, where I had not previously cloned it. My first attempt was a direct clone from kernel.org, and let me tell you, it didn't go all that fast over my DSL link.

Like probably everyone else with a Linux kernel tree, Guenter Roeck's tree is ultimately a descendant from the regular official Linux kernel tree. I already keep a clone of the regular Linux kernel tree at home (and at work), because I wind up referring to it often enough. So, I wondered, what if I started by making a copy of my kernel tree, then added Guenter Roeck's as an additional upstream and fetched it?

Well:

remote: Counting objects: 2269, done.
remote: Compressing objects: 100% (567/567), done.
remote: Total 2269 (delta 1804), reused 2061 (delta 1702)
Receiving objects: 100% (2269/2269), 1.30 MiB | 268.00 KiB/s, done.
Resolving deltas: 100% (1804/1804), done.

Let me assure you that cloning a full Linux kernel tree involves a lot more than 2269 objects and a lot more than 1.3 MiB of data.

Because of git's fundamental nature as a content-addressable data store, in theory this trick works on anything with significant object overlap, not just things that ultimately descend from the same source. In practice this is generally unimportant; almost everything you're going to want to pull this trick on has a common ancestor.

(The possible exception is if two separate groups are maintaining git repos that are converted from something else, such as a Mercurial repo. At least the objects should be identical between the two repos, and if you're lucky maybe the commits as well, despite these repos not having a common git ancestor.)

This feels like an obvious trick now that I've done it once, so I'm probably going to try to do it more. There are some variations of the trick one can probably perform, such as actively changing the 'origin' upstream over to the upstream you really want to be based on and pulling from. My one question about that would be how one cleans up branches (and perhaps tags) that are only found in the repo you started out by cloning from.

GitCloningBaseBenefit written at 21:34:09; Add Comment

2019-02-15

Accumulating a separated list in the Bourne shell

One of the things that comes up over and over again when formatting output is that you want to output a list of things with some separator between them but you don't want this separator to appear at the start or the end, or if there is only one item in the list. For instance, suppose that you are formatting URL parameters in a tiny little shell script and you may have one or more parameters. If you have more than one parameter, you need to separate them with '&'; if you have only one parameter, the web server may well be unhappy if you stick an '&' before or after it.

(Or not. Web servers are often very accepting of crazy things in URLs and URL parameters, but one shouldn't count on it. And it just looks irritating.)

The very brute force approach to this general problem in Bourne shells goes like this:

tot=""
for i in "$@"; do
  ....
  v="var-thing=$i"
  if [ -z "$tot" ]; then
    tot="$v"
  else
    tot="$tot&$v"
  fi
done

But this is five or six lines and involves some amount of repetition. It would be nice to do better, so when I had to deal with this recently I looked into the Dash manpage to see if it's possible to do better with shell substitutions or something else clever. With shell substitutions we can condense this a lot, but we can't get rid of all of the repetition:

tot="${tot:+$tot&}var-thing=$i"

It annoys me that tot is repeated in this. However, this is probably the best all-around option in normal Bourne shell.

Bash has arrays, but the manpage's documentation of them makes my head hurt and this results in Bash-specific scripts (or at least scripts specific to any shell with support for arrays). I'm also not sure if there's any simple way of doing a 'join' operation to generate the array elements together with a separator between them, which is the whole point of the exercise.

(But now I've read various web pages on Bash arrays so I feel like I know a little bit more about them. Also, on joining, see this Stackoverflow Q&A; it looks like there's no built-in support for it.)

In the process of writing this entry, I realized that there is an option that exploits POSIX pattern substitution after generating our '$tot' to remove any unwanted prefix or suffix. Let me show you what I mean:

tot=""
for i in "$@"; do
  ...
  tot="$tot&var-thing=$i"
done
# remove leading '&':
tot="${tot#&}"

This feels a little bit unclean, since we're adding on a separator that we don't want and then removing it later. Among other things, that seems like it could invite accidents where at some point we forget to remove that leading separator. As a result, I think that the version using '${var:+word}' substitution is the best option, and it's what I'm going to stick with.

BourneSeparatedList written at 23:12:33; Add Comment

2019-02-06

Using a single git repo to compare things between two upstreams

The other day I wrote about hand-building an updated upstream kernel module. One of the things that I wanted to do in that is to compare the code of the nct6775 module I wanted to build between the 4.20.x branch in the stable tree and the hwmon-next branch in Guenter Roeck's tree. In my entry, I did this by cloning each Git repo separately and then running diff by hand, but this is a little awkward and I said that there was probably a way to do this in a single Git repo. Today I have worked out how to do that, and so I'm going to write it down.

To do this we need a single Git repo with both trees present in it, which means that both upstream repos need to be remotes. We can set up one as a remote simply by cloning from it:

git clone [...]/groeck/linux-staging.git

(I've chosen to start with the repo I'm theoretically going to be building from, instead of the repo I'm only using to diff against.)

Then we need to add the second repo as a remote, and fetch it:

cd linux-staging
git remote add stable [...]/stable/linux.git
git fetch stable

At this point 'git branch -r' will show us that we have all of the branches from both sides. With the data from both upstreams in our local repo and a full set of branches, we can do the full form of the diff:

git diff stable/linux-4.20.y..origin/hwmon-next drivers/hwmon/nct6775.c

We can make this more convenient by shortening one or both names, like so:

git checkout linux-4.20.y
git checkout hwmon-next

git diff linux-4.20.y.. drivers/hwmon/nct6775.c

I'm using 'git checkout' here partly as a convenient way to run 'git branch' with the right magic set of options:

git branch --track linux-4.20.y stable/linux-4.20.y

Actually checking out hwmon-next means we don't have to name it explicitly.

We can also diff against tags from the stable repo, and we get to do it without needing to say which upstream the tags are from:

git diff v4.20.6.. drivers/hwmon/nct6775.c
git diff v4.19.15.. drivers/hwmon/nct6775.c

The one drawback I know of to a multi-headed repo like this is that I'm not sure how you get rid of an upstream that you don't want any more. At one level you can just delete the remote, but that leaves various things cluttering up your repo, including both branches and tags. Presumably there is a way in Git to clean those up and then let Git's garbage collection eventually delete the actual Git objects involved and reclaim the storage.

(One can do more involved magic by not configuring the second repo as a remote and using 'git fetch' directly with its URL, but I'm not sure how to make the branch handling work nicely and so on. Setting it up as a full remote makes all of that work, although it also pulls in all tags unless you use '--no-tags' and understand what you're doing here, which I don't.)

Looking back, all of this is relatively basic and straightforward and I think I knew most of the individual bits and pieces involved. But I'm not yet familiar and experienced enough with git to confidently put them all together on the fly when my real focus is doing something else.

(Git is one of those things that I feel I should be more familiar with than I actually am, so every so often I take a run at learning how to do another thing in it.)

GitCompareAcrossUpstreams written at 23:10:50; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.