2017-05-31
How a lot of specifications are often read
In the minds of specification authors, I suspect that they have an 'ideal reader' of their specification. This ideal reader is a careful person; they read the specification all the way through, cross-referencing what they read with other sections and perhaps keeping notes. When there is ambiguity in one part, the ideal reader keeps it in mind as an unsettled issue and looks for things said in other parts that will resolve it, and when something of global importance is mentioned in one section, the reader remembers and applies it to the entire specification.
I'm sure that some specifications are read by some people in this way. If you're working on something of significant importance (especially commercial importance) and there's a core standard, probably you approach it with this degree of care and time, because there is a lot on the line. However, I don't think that this is common. In practice I believe that most people read most specifications rather differently; they read them as if they were references.
People very rarely read references front to back, much less taking notes and reconciling confusions. Instead, they perhaps skim your overview and then when they have a question they open up the reference (the specification), go to the specific section for their issue, and try to read as little as possible in order to get an answer. Perhaps they'll skim some amount of things around the section just in case. People do this for a straightforward reason; they don't want to spend the time to read the entire thing carefully, especially when they have a specific question.
(If it's not too long and is written decently well, people may read your entire specification once, casually and with some skimming, just to get a broad understanding of it. But they're unlikely to read it closely with lots of care, because that's too much work, and then when they wind up with further questions they're going to flip over to treating the specification as a reference and trying to read as little as possible.)
The corollary to this is that in a specification that you want to be implemented unambiguously, it's important that each part or section is either complete in itself or clearly incomplete in a way that actively forces people to go follow cross-references. If you write a section so that it looks complete but is actually modified in an important way by another section, you can probably expect a fair number of the specification's readers to not realize this; they will just assume that it's complete and then they won't remember, notice, or find your qualifications elsewhere.
(This includes sections that are quietly ambiguous as written but have that ambiguity resolved by another section. When this happens, readers are basically invited to assume that they know what you mean and to make up their own answers. This is a great way to wind up with implementations that don't do what you intended.)
Why one git fetch
default configuration bit is probably okay
I've recently been reading the git fetch
manpage reasonably carefully as part
of trying to understand what I'm doing with limited fetches. If you do this, you'll run across an
interesting piece of information about the <refspec>
argument,
including in its form as the fetch =
setting for remotes.
The basic syntax is '<src>:<dst>
', and the standard version
that is created by any git clone
gives you:
fetch = +refs/heads/*:refs/remotes/origin/*
You might wonder about that +
at the start, and I certainly did.
Well, it's special magic. To quote the documentation:
The remote ref that matches <src> is fetched, and if <dst> is not empty string, the local ref that matches it is fast-forwarded using <src>. If the optional plus
+
is used, the local ref is updated even if it does not result in a fast-forward update.
(Emphasis mine.)
When I read this my eyebrows went up, because it sounded dangerous.
There's certainly lots of complicated processes around 'git pull
'
if it detects that it can't fast-forward what it's just fetched,
so allowing non-fast-forward fetches (and by default) certainly
sounded like maybe it was something I wanted to turn off. So I tried
to think carefully about what's going on here, and as a result I now
believe that this configuration is mostly harmless and probably what
you want.
The big thing is that this is not about what happens with your local
branch, eg master
or rel-1.8
. This is about your repo's copy
of the remote branch, for example origin/master
or origin/rel-1.8
.
And it is not even about the branch, because branches are really
'refs', symbolic references to specific commits. git fetch
maintains
refs (here under refs/remotes/origin
) for every branch that you're
copying from the remote, and one of the things that it does when
you fetch updates is update these refs. This lets the rest of Git
use them and do things like merge or fast-forward remote updates
into your local remote-tracking branch.
So git fetch
's documentation is talking about what it does to
these remote-branch refs if the branch on the remote has been rebased
or rewound so that it is no longer a more recent version of what
you have from your last update of the remote. With the +
included
in the <refspec>
, git fetch
always updates your repo's ref for
the remote branch to match whatever the remote has; basically it
overwrites whatever ref you used to have with the new ref from the
remote. After a fetch, your origin/master
or origin/rel-1.8
will always be the same as the remote's, even if the remote rebased,
rewound, or did other weird things. You can then go on to fix up
your local branch in a variety of ways.
(To be technical your origin/master
will be the same as origin
's
master
, but you get the idea here.)
This makes the +
a reasonable default, because it means that 'git
fetch
' will reliably mirror even a remote that is rebasing and
otherwise having its history rewritten and its branches changed
around. Without the +
, 'git fetch
' might transfer the new and
revised commits and trees from your remote but it wouldn't give you
any convenient reference for them for you to look at them, integrate
them, or just reset your local remote-tracking branch to their new
state.
(Without the '+
', 'git fetch
' won't update your repo's remote-branch
refs. I don't know if it writes the new ref information anywhere,
perhaps to .git/FETCH_HEAD
, or if it just throws it away,
possibly after printing out commit hashes.)
Sidebar: When I can imagine not using a '+
'
The one thing that using a '+
' does is that it sort of allows a
remote to effectively delete past history out of your local repo,
something that's not normally possible in a DVCS and potentially
not desirable. It doesn't do this
directly, but it starts an indirect process of it and it certainly
makes the old history somewhat annoying to get at.
Git doesn't let a remote directly delete commits, trees, and objects. But unreferenced items in your repo are slowly garbage-collected after a while and when you update your remote-branch refs after a non-ff fetch, the old commits that the pre-fetch refs pointed to start becoming more and more unreachable. I believe they live on in the reflog for a while, but you have to know that they're missing and to look.
If you want to be absolutely sure that you notice any funny business
going on in an upstream remote that is not supposed to modify its
public history this way, not using '+
' will probably help. I'm not
sure if it's the easiest way to do this, though, because I don't know
what 'git fetch
' does when it detects a non-ff fetch like this.
(Hopefully git fetch
complains loudly instead of failing silently.)