When cloning git repos, things go faster if you start from a good base
Normally I get a copy of any git repo that I want by directly cloning from its upstream, whatever that is. Generally this goes fast enough and it guarantees that I have an exact copy of the repo, one that's not contaminated by anything else that I might have been doing to any other copy of the repo that I might already have. But generally I'm doing it on a 1G network link. Recently I needed a copy of Guenter Roeck's tree of hwmon updates for the Linux kernel on my home machine, where I had not previously cloned it. My first attempt was a direct clone from kernel.org, and let me tell you, it didn't go all that fast over my DSL link.
Like probably everyone else with a Linux kernel tree, Guenter Roeck's tree is ultimately a descendant from the regular official Linux kernel tree. I already keep a clone of the regular Linux kernel tree at home (and at work), because I wind up referring to it often enough. So, I wondered, what if I started by making a copy of my kernel tree, then added Guenter Roeck's as an additional upstream and fetched it?
remote: Counting objects: 2269, done. remote: Compressing objects: 100% (567/567), done. remote: Total 2269 (delta 1804), reused 2061 (delta 1702) Receiving objects: 100% (2269/2269), 1.30 MiB | 268.00 KiB/s, done. Resolving deltas: 100% (1804/1804), done.
Let me assure you that cloning a full Linux kernel tree involves a lot more than 2269 objects and a lot more than 1.3 MiB of data.
Because of git's fundamental nature as a content-addressable data store, in theory this trick works on anything with significant object overlap, not just things that ultimately descend from the same source. In practice this is generally unimportant; almost everything you're going to want to pull this trick on has a common ancestor.
(The possible exception is if two separate groups are maintaining git repos that are converted from something else, such as a Mercurial repo. At least the objects should be identical between the two repos, and if you're lucky maybe the commits as well, despite these repos not having a common git ancestor.)
This feels like an obvious trick now that I've done it once, so I'm
probably going to try to do it more. There are some variations of
the trick one can probably perform, such as actively changing the
origin' upstream over to the upstream you really want to be based
on and pulling from. My one question about that would be how one
cleans up branches (and perhaps tags) that are only found in the
repo you started out by cloning from.