Wandering Thoughts archives

2017-05-31

How a lot of specifications are often read

In the minds of specification authors, I suspect that they have an 'ideal reader' of their specification. This ideal reader is a careful person; they read the specification all the way through, cross-referencing what they read with other sections and perhaps keeping notes. When there is ambiguity in one part, the ideal reader keeps it in mind as an unsettled issue and looks for things said in other parts that will resolve it, and when something of global importance is mentioned in one section, the reader remembers and applies it to the entire specification.

I'm sure that some specifications are read by some people in this way. If you're working on something of significant importance (especially commercial importance) and there's a core standard, probably you approach it with this degree of care and time, because there is a lot on the line. However, I don't think that this is common. In practice I believe that most people read most specifications rather differently; they read them as if they were references.

People very rarely read references front to back, much less taking notes and reconciling confusions. Instead, they perhaps skim your overview and then when they have a question they open up the reference (the specification), go to the specific section for their issue, and try to read as little as possible in order to get an answer. Perhaps they'll skim some amount of things around the section just in case. People do this for a straightforward reason; they don't want to spend the time to read the entire thing carefully, especially when they have a specific question.

(If it's not too long and is written decently well, people may read your entire specification once, casually and with some skimming, just to get a broad understanding of it. But they're unlikely to read it closely with lots of care, because that's too much work, and then when they wind up with further questions they're going to flip over to treating the specification as a reference and trying to read as little as possible.)

The corollary to this is that in a specification that you want to be implemented unambiguously, it's important that each part or section is either complete in itself or clearly incomplete in a way that actively forces people to go follow cross-references. If you write a section so that it looks complete but is actually modified in an important way by another section, you can probably expect a fair number of the specification's readers to not realize this; they will just assume that it's complete and then they won't remember, notice, or find your qualifications elsewhere.

(This includes sections that are quietly ambiguous as written but have that ambiguity resolved by another section. When this happens, readers are basically invited to assume that they know what you mean and to make up their own answers. This is a great way to wind up with implementations that don't do what you intended.)

tech/SpecsHowAreOftenRead written at 23:07:12; Add Comment

Why one git fetch default configuration bit is probably okay

I've recently been reading the git fetch manpage reasonably carefully as part of trying to understand what I'm doing with limited fetches. If you do this, you'll run across an interesting piece of information about the <refspec> argument, including in its form as the fetch = setting for remotes. The basic syntax is '<src>:<dst>', and the standard version that is created by any git clone gives you:

fetch = +refs/heads/*:refs/remotes/origin/*

You might wonder about that + at the start, and I certainly did. Well, it's special magic. To quote the documentation:

The remote ref that matches <src> is fetched, and if <dst> is not empty string, the local ref that matches it is fast-forwarded using <src>. If the optional plus + is used, the local ref is updated even if it does not result in a fast-forward update.

(Emphasis mine.)

When I read this my eyebrows went up, because it sounded dangerous. There's certainly lots of complicated processes around 'git pull' if it detects that it can't fast-forward what it's just fetched, so allowing non-fast-forward fetches (and by default) certainly sounded like maybe it was something I wanted to turn off. So I tried to think carefully about what's going on here, and as a result I now believe that this configuration is mostly harmless and probably what you want.

The big thing is that this is not about what happens with your local branch, eg master or rel-1.8. This is about your repo's copy of the remote branch, for example origin/master or origin/rel-1.8. And it is not even about the branch, because branches are really 'refs', symbolic references to specific commits. git fetch maintains refs (here under refs/remotes/origin) for every branch that you're copying from the remote, and one of the things that it does when you fetch updates is update these refs. This lets the rest of Git use them and do things like merge or fast-forward remote updates into your local remote-tracking branch.

So git fetch's documentation is talking about what it does to these remote-branch refs if the branch on the remote has been rebased or rewound so that it is no longer a more recent version of what you have from your last update of the remote. With the + included in the <refspec>, git fetch always updates your repo's ref for the remote branch to match whatever the remote has; basically it overwrites whatever ref you used to have with the new ref from the remote. After a fetch, your origin/master or origin/rel-1.8 will always be the same as the remote's, even if the remote rebased, rewound, or did other weird things. You can then go on to fix up your local branch in a variety of ways.

(To be technical your origin/master will be the same as origin's master, but you get the idea here.)

This makes the + a reasonable default, because it means that 'git fetch' will reliably mirror even a remote that is rebasing and otherwise having its history rewritten and its branches changed around. Without the +, 'git fetch' might transfer the new and revised commits and trees from your remote but it wouldn't give you any convenient reference for them for you to look at them, integrate them, or just reset your local remote-tracking branch to their new state.

(Without the '+', 'git fetch' won't update your repo's remote-branch refs. I don't know if it writes the new ref information anywhere, perhaps to .git/FETCH_HEAD, or if it just throws it away, possibly after printing out commit hashes.)

Sidebar: When I can imagine not using a '+'

The one thing that using a '+' does is that it sort of allows a remote to effectively delete past history out of your local repo, something that's not normally possible in a DVCS and potentially not desirable. It doesn't do this directly, but it starts an indirect process of it and it certainly makes the old history somewhat annoying to get at.

Git doesn't let a remote directly delete commits, trees, and objects. But unreferenced items in your repo are slowly garbage-collected after a while and when you update your remote-branch refs after a non-ff fetch, the old commits that the pre-fetch refs pointed to start becoming more and more unreachable. I believe they live on in the reflog for a while, but you have to know that they're missing and to look.

If you want to be absolutely sure that you notice any funny business going on in an upstream remote that is not supposed to modify its public history this way, not using '+' will probably help. I'm not sure if it's the easiest way to do this, though, because I don't know what 'git fetch' does when it detects a non-ff fetch like this.

(Hopefully git fetch complains loudly instead of failing silently.)

programming/GitFetchMagicPlus written at 00:44:13; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.