2015-10-18
In Go, unsafe.Pointer is a built in type in the compiler
Here is something that I didn't fully grasp and understand until I did some digging in the Go compiler as part of writing a recent entry:
Despite being in a package,
unsafe.Pointeris really a built in type in the Go compiler, just likemap,chan,string, and so on.
While there is an unsafe package and it looks somewhat superficially
like reflect (which is a real package with real code), this is
an illusion. All parts of unsafe are implemented at compile time
inside the compiler and as part of this unsafe.Pointer is a built
in type, much like uintptr (or more to the point, something complex
like chan, map, or slices). One consequence of this is that
nothing involving an unsafe.Pointer (such as how it interacts
with escape analysis) can be understood or predicted by thinking
about it at the Go level through analogies to regular pointers or
the like. unsafe.Pointer is not just a little bit magical; it's
a lot magical.
(See eg TUNSAFEPTR in go.go.
The unsafe functions are interpreted at compile time in unsafe.go,
which turns all of them into literals of type uintptr.)
I imagine a large reason that unsafe.Pointer is not simply Pointer
is so that people are forced to explicitly import unsafe in order
to use it. This probably both avoids some casual use and makes it
easier to find (or check for) potential dangerous things; all you
have to do is scan things looking for imports of unsafe.
(There's also that 'go tool compile' accepts a -u argument,
which disables use of unsafe as part of its effects.)
(Perhaps this has been obvious to other people, but it wasn't to
me, especially given that reflect seems to involve at most a tiny
little bit of special compiler magic; in Go 1.5, it's a lot of real
Go code in a real package.)
2015-10-16
Inside a Go 'terrible hack' in the reflect package
Recently John Allspaw tweeted about some 'terrible hack' comments in the source code of various important projects. One of them was Go, which made me curious about the context. Although I can't exactly match Allspaw's version of the comment, I think it's this comment in src/reflect/value.go, which I'm going to quote in full:
func ValueOf(i interface{}) Value {
[...]
// TODO(rsc): Eliminate this terrible hack.
// In the call to unpackEface, i.typ doesn't escape,
// and i.word is an integer. So it looks like
// i doesn't escape. But really it does,
// because i.word is actually a pointer.
escapes(i)
return unpackEface(i)
}
In a nutshell, what's going on here is that the compiler is being too smart about escape analysis and the 'hack' code here is defeating that smartness in order to avoid memory errors.
Per Russ Cox's explanation on how interfaces are implemented, an interface{} value is
represented in memory as essentially two pointers, one to the
underlying type and one to the actual value. unpackEface() magically
turns this into a reflect.Value, which has exactly this
information (plus some internal stuff). Unfortunately it does so
in a way that causes the compiler's escape analysis to think that
nothing from the 'i' argument outlives ('escapes') unpackEface(),
which would normally mean that the compiler thinks 'i' doesn't
outlive ValueOf() either.
So let's imagine that you write:
type astruct struct { ... }func toval() reflect.Value { var danger astruct return reflect.ValueOf(&danger) }
Without the hack, escape analysis could tell the Go compiler that
&danger doesn't escape reflect.ValueOf(), which would make
danger safe to allocate on the stack, where it would get (implicitly)
destroyed when toval() returns. Unfortunately the Value returned
by toval() actually refers to this now-destroyed stack memory.
Whoops. By explicitly defeating escape analysis, ValueOf() forces
danger to be allocated in the heap where it will outlive toval()
and thus avoid this bug.
(You might wonder if Go garbage collection has similar problems and
the answer is apparently 'no', although the details are well beyond
both me and the scope of this entry. See this golang-nuts thread
on garbage collection and unsafe.Pointer.)
A Go compiler that was less smart about escape analysis wouldn't have this problem; as you can see, the compiler has to reason through several layers of code to go wrong. But escape analysis is an important optimization for a language like Go so the compiler has clearly worked hard at it.
(If the Go compiler is doing not just cross function but cross package escape analysis (which it certainly looks like), I have to say that I'm impressed by how thorough it's being.)
Sidebar: How escapes() works
Before I looked at it, I expected escapes() to involve some deep
magic to tell the compiler to go away. The reality
is more prosaic (and humbling for me in my flights of imagination):
var dummy struct {
b bool
x interface{}
}
func escapes(x interface{}) {
if dummy.b {
dummy.x = x
}
}
In theory a sufficiently smart compiler could detect that dummy
is not exported from reflect and is not touched inside it, so
dummy.b is always false and escapes() always does nothing and
so x does not escape it. In practice I suspect that the Go
compiler will never get that perversely smart for various reasons.
2015-10-13
Why I've come to really like git
I mentioned on Twitter that I've completely flipped my view of git (and Mercurial) around, to the point where I actively like using git and want to be using git for things. This goes beyond where I used to be, where I simply felt that git was the right thing to learn and use going forward for relatively pragmatic reasons. Part of this is undoubtedly just increasing familiarity with git (it's hard not to feel good about something you use a lot), but I can also point to specific things that I really like about git.
The first is git rebase and everything around it, both when dealing
with other people's projects and when
working on my own stuff. Rebasing (and
things it enables) has clearly made my life
easier and nicer, especially when carrying my own modifications on
top of other people's software. Interactive rebasing also makes it
easy to manipulate my commits in general to do things like shuffle
the order or squash several commits together. Perhaps there are
other tools for this, but I already need to know rebasing, so I've
opted to keep my life simple.
The second turned out to be git's index. There are two primary
things I love about the index: it lets me see definitely and exactly
what I'll be committing before I do it via 'git diff --cached'
(which I use so much I have an alias for it), and it easily lets
me make and check selective commits. Of course in theory I shouldn't
need to make selective commits because I should be working selectively
to start with. In practices, no, I don't naturally wind up working
that way so it's great to be able to methodically untangle the
resulting messy work tree into a series of neat commits by staging
things through the index.
(That the index is explicit is very important here for me, because
it means I can stage things into the index, see the diff, and then
say 'oops, no, I left something out' or 'oops, no, I put too much
in, let's try that again'. An on the fly single pass 'select and
commit' model demands that I get it right the first time with little
margin for error. With the index I can abort right up to the point
where I complete 'git commit' and I haven't lost any of my prep
work so far.)
The third thing is 'git grep', or more specifically the fact that
the git people have somehow made it amazingly fast. 'git grep'
is clearly much faster at searching repos (especially big repos)
for things than normal 'grep -r', 'find | grep', and so on.
Since a bunch of what I do with other people's repos is fish through
them trying to find things, this is really great for me; I can search
the Linux kernel repo, the Illumos repo, and so on fast enough to make
it a casual thing to do. By contrast, finding things in the Mozilla
Mercurial repo is always a comparatively slow pain.
(Mercurial has 'hg grep', but it does something completely
different. What it does is useful but something that I want much
less often.)
Although I can't point to anything big in specific, in general I've
wound up feeling that git makes it easier (and possible) to manipulate
my repos in crazy ways if I really need to. I suppose 'git
filter-branch' is the poster child for this (although the feature
I wound up caring about has been mostly wrapped up as 'git subtree
split'), but I've also used things like changing the upstream
of branches. Basically it feels like if
git can possibly support something, it will somehow and I can make
things work.
(I may discover additional nice things about git in the future, but this is my current list of things that really affect me when I work with a git repo versus eg a Mercurial repo.)