Wandering Thoughts archives

2017-04-30

Some more feelings on nondeterministic garbage collection

A while back I wrote an entry about the problem with nondeterministic garbage collection, more or less as part of my views at the time on PyPy. In that entry I was fairly down on nondeterministic GC. I still feel more or less that way about PyPy's garbage collection. Yet at the same time I use and like Go (and I did back then), which very definitely has nondeterministic garbage collection, and I don't find it to be a problem or something that annoys me. When I was revisiting this recently, I found myself wondering what the difference is. Is it just that I like Go enough that I'm unconsciously forgiving it this?

I don't think it's that simple. Instead I think it comes down to what I could call the culture of the language but instead is better described as 'how people write code in practice'. CPython has always had a deterministic garbage collector with prompt garbage collection, and as a result people wrote plenty of code that assumes that behavior and will do various degrees of unfortunate things if it's run in an environment, like PyPy, that violates that assumption. In practice Python programmers have developed and routinely use plenty of idioms that more or less assume deterministic GC; this code may be 'incorrect' in some sense, but it's also common and normal.

(It is correct code for CPython in practice, in that it works and is efficient to write and so on.)

By contrast, Go had nondeterministic GC from the beginning and people have been coding with that in mind from the start. One partial consequence of this is that Go APIs are often carefully designed so that you can mostly avoid allocations if you want to go to the effort, with caller-supplied reusable buffers and so on. Writing such code is even pretty natural and obvious in Go, in a way that it isn't in Python. I'm pretty sure that Go's features, APIs, and coding style have all been shaped by it having nondeterministic GC, in ways that hasn't happened for Python because CPython had deterministic GC.

I also suspect that nondeterministic GC simply works better in a language that's explicitly designed to create less memory and object churn. Go has any number of language and compiler features that are partly designed to reduce memory pressure, things like unboxed array members, unboxed variables in general, and escape analysis (to enable cheap stack allocation of values).

(Static typing helps here too, but that's something that has reasons well beyond reducing memory pressure.)

PS: I don't have any directly comparable programs, but in operation this Go program seems to have about the same memory usage as this Python program, based on RSS. They aren't seeing the same load and don't quite do the same thing, but they're as close as I can get unless I get very energetic and rewrite DWiki in Go.

NondeterministicGCII written at 23:09:44; Add Comment

2017-04-27

Understanding Git's model versus understanding its magic

In a comment on my entry on coming to a better understanding of what git rebase does, Ricky suggested I might find Understanding Git Conceptually to be of interest. This provides me with an opportunity to talk about what I think my problem with mastering Git is.

It's worth quoting Charles Duan here:

The conclusion I draw from this is that you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.

I actually feel that I have a relatively good grasp of the technical underpinnings of Git, what many people would call 'how Git works'. To wave my hands a bit, Git is a content addressable store that is used to create snapshots of trees, which are then threaded together in a sequence with commits, and so on and so forth. This lets me nod and go 'of course' about any number of apparently paradoxical things, such as git repositories with multiple initial commits. I don't particularly have this understanding because I worked for it; instead, I mostly have it because I happened to be standing around in the right place at the right time to see Git in its early days.

(There are bits of git that I understand less about the technicalities, like the index. I have probably read a description of the guts of the index at least a few times, but I couldn't tell you off the top of my head how even a simple version of the index works at a mechanical level. It turns out to be covered in this StackOverflow answer; the short version is that the index is a composite of a directory file and a bunch of normal object blobs.)

But in practice Git layers a great deal of magic on top of this technical model of its inner workings. Branches are references to commits (ie, heads) and git advances the reference when you make commits under the right circumstances; simple. Except that some branches have 'upstreams' and are 'remote tracking branches' and so on. All of these pieces of magic are not intrinsic to the technical model (partly because the technical model is a strictly local one), but they are very important for working with Git in many real situations.

It is this magic that I haven't mastered and internalized. For example, I understand what 'git fetch' does to your repository, and I can see why you would want it to update certain branch references so you can find the newly imported commits. But I have to think about why 'git fetch' will update certain branches and not others, and I don't know off the top of my head the settings that control this or how you change them.

It's possible that Git has general patterns in this sort of magic, the way it has general patterns at its technical level. If it does, I have not yet understood enough of the magic to have noticed the general patterns. My personal suspicion is that general patterns do not necessarily exist at this layer, because the commands and operations I think of as part of this layer are actually things that have accreted into Git over time and were written by different people.

(At one point Git had a split between 'porcelain' and 'plumbing', where porcelain was the convenient user interface and was at least partially developed by different people than the core 'plumbing'. And bits of porcelain were developed by different people who had their own mental models for how their particular operation should behave, with git rebase's lack of an option for the branch name of the result being an example.)

In a way my understanding of Git's internals has probably held me back with Git in general, because it's helped to encouraged me to have a lackadaisical attitude about learning Git in general. The result is that I make little surgical strikes on manpages and problems, and once I feel I've solved them well enough I go away again. In this I've been mirroring one of the two ways that I approach new programming languages. I've likely reached the point in Git where I should switch over to thoroughly slogging through some parts of it; one weakness that's become obvious in writing this entry is basically everything to do with remote repositories.

GitCoreVersusMagic written at 01:10:48; Add Comment

2017-04-26

Coming to a better understanding of what git rebase does

Although I've used it reasonably regularly, git rebase has so far been a little bit magical to me, as you may be able to tell from my extensive explanation to myself of using it to rebase changes on top of an upstream rebase. In my grand tradition, I'm going to write down what I hope is a better understanding of what it does and how its arguments interact with that.

What git rebase does is that it takes a series of commits, replays them on top of some new commit, and then gives the resulting top commit a name so that you can use it. When you use the three argument form with --onto, you are fully specifying all of these. Take this command:

git rebase --onto muennich/master old-muennich master

--onto names the new commit everything will be put onto (usually it's a branch, as it is here), the series of commits that will be replayed is old-muennich..master, and the new name is also master. You don't get a choice about the new name; git rebase always makes your new rebase into your branch, discarding the old value of the branch.

(As far as I can tell there's no technical reason why git rebase couldn't let you specify the branch name of the result; it's just not in the conceptual model the authors have of how it should work. If you need this, you need to manually create a new branch beforehand.)

The minimal version has no arguments:

git rebase

This only works on branches with an upstream. It replays your commits from the current branch on top of the current (ie new) upstream, and it determines the range of commits to rebase roughly by finding the closest common ancestor of your commits and the upstream:

A -> B -> C -> D               [origin/master]
      \-> local-1 -> local-2   [master]

In this bad plain text diagram, the upstream added C and D while you have local-1 and local-2. The common point is B, and so B..master describes the commits that will be put on top of origin/master and then your master branch will be switched to them (well, the new version of them).

A rebase is conceptually a push to cherry-pick's pull. In cherry picking, you start on the new clean branch and pull in changes from elsewhere. In rebasing, you start on your 'dirty' local branch and push its changes on top of some other (clean) branch. You then keep the name of your local branch but not its old origin point.

If you use the one or two argument form of git rebase, you're explicitly telling rebase what to consider the 'upstream' for both determining the common ancestor commit and for what to put your changes on top of. If I'm understanding this correctly, the following commands are both equivalent to a plain 'git rebase' on your master branch:

git rebase origin/master
git rebase origin/master master

Based on the diagrams in the git-rebase manpage, it looks like the one and two argument forms are most useful for cases where you have multiple local branches and want to shuffle around the relationship between them.

In general the git-rebase manpage has helpful examples combined with extensive ASCII diagrams. If I periodically read it carefully whenever I'm confused, it will probably all sink in eventually.

(Of course, the git manual page that I actually should read carefully several times until it all sinks in and sticks is the one on specifying revisions and ranges for Git. I sort of know what a number of the different forms mean, but in practice it's one part folklore to one part actual knowledge.)

GitRebaseUnderstanding written at 02:01:28; Add Comment

2017-04-23

How I rebased changes on top of other rebased changes in Git

A while ago I wrote an entry on some git repository changes that I didn't know how to do well. One of them was rebasing my own changes on top of a repository that itself had been rebased; in the comments, Aristotle Pagaltzis confirmed that his Stackoverflow answer about this was exactly what I wanted. Since I've now actually gone through this process for the first time, I want to write down the details for myself, with commentary to explain how and why everything works. Much of this commentary will seem obvious to people who use Git a lot, but it reflects some concerns and confusions that I had at the time.

First, the repositories involved. rc is the master upstream repository for Byron Rakitzis's Unix reimplementation of Tom Duff's rc shell. It is not rebased; infrequent changes flow forward as normal for a public Git repo. What I'm going to call muennich-rc is Bert Münnich's collection of interesting modifications on top of rc; it is periodically rebased, either in response to changes in rc or just as Bert Münnich does development on it. Finally I have my own repository with my own local changes on top of muennich-rc. When muennich-rc rebases, I want to rebase my own changes on top of that rebase.

I start in my own repository, before fetching anything from upstream:

  1. git branch old-me

    This creates a branch that captures the initial state of my tree. It's not used in the rebasing process; instead it's a safety measure so that I can reset back to it if necessary without having to consult something like the git reflog. Because I've run git branch without an additional argument, old-me is equivalent to master until I do something to change master.

  2. git branch old-muennich muennich/master.

    muennich/master is the upstream for muennich-rc. Creating a branch captures the (old) top commit for muennich-rc that my changes are on top of.

    Because both old-me and old-muennich have been created as plain ordinary git branches, not upstream tracking branches, their position won't change regardless of fetching and other changes during the rebase. I'm really using them as bookmarks for specific commits instead of actual branches that I will add commits on top of.

    (I'm sure this is second nature to experienced Git people, but when I made old-muennich I had to pause and convince myself that what commit it referred to wasn't going to change later, the way that master changes when you do a 'git pull'. Yes, I know, 'git pull' does more than 'git fetch' does and the difference is important here.)

  3. git fetch

    This pulls in the upstream changes from muennich-rc, updating what muennich/master refers to to be the current top commit of muennich-rc. It's now possible to do things like 'git diff old-muennich muennich/master' to see any differences between the old muennich-rc and the newly updated version.

    (Because I did git fetch instead of git pull or anything else, only muennich/master changed. In particular, master has not changed and is still the same as old-me.)

  4. git rebase --onto muennich/master old-muennich master

    This does all the work (well, I had to resolve and merge some conflicts). What it means is 'take all of the commits that go from old-muennich to master and rebase them on top of muennich/master; afterward, set the end result to be master'.

    (If I omitted the old-muennich argument, I would be trying to rebase both my local changes and the old upstream changes from muennich-rc on top of the current muennich-rc. Depending on the exact changes involved in muennich-rc's rebasing, this could have various conflicts and bad effects (for instance, reintroducing changes that Bert Münnich had decided to discard). There is a common ancestor in the master rc repository, but there could be a lot of changes between there and here.)

    The local changes that I added to the old version of muennich-rc are exactly the commits from old-muennich to master (ie, they're what would be shown by 'git log old-muennich..master', per the git-rebase manpage), so I'm putting my local commits on top of muennich/master. Since the current muennich/master is the top of the just-fetched new version of muennich-rc, I'm putting my local commits on top of the latest upstream rebase. This is exactly what I want to do; I'm rebasing my commits on top of an upstream rebase.

  5. After the dust has settled, I can get rid of the two branches I was using as bookmarks:

    git branch -D old-me
    git branch -D old-muennich
    

    I have to use -D because as far as git is concerned these branches both have unmerged changes. They're unmerged because these branches have both been orphaned by the combination of the muennich-rc rebase and my rebase.

Because I don't care (much) about the old version of my changes that are on top of the old version of muennich-rc, doing a rebase instead of a cherry-pick is the correct option. Following my realization on cherry-picking versus rebasing, there are related scenarios where I might want to cherry-pick instead, for example if I wasn't certain that I liked some of the changes in the rebased muennich-rc and I might want to fall back to the old version. Of course in this situation I could get the same effect by keeping the two branches after the rebase instead of deleting them.

GitRebaseOnRebase written at 01:02:10; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.