2015-11-27
A thought on the apparent popularity of new static typing languages
It all started with Elben Shira's The End of Dynamic Languages, which sparked Have Static Languages Won? (via) from Maxime Chevalier-Boisvert. In that I read the following:
Like [Elben Shira], I've noticed that despite the fact that there have been an enormous number of new programming languages coming out recently, the overwhelming majority of them are statically typed. [...]
I'm an outsider bystander on all of this, but it strikes me that one possible contributing reason for lots of people creating new statically typed languages is that there is a significant body of academic research that has not yet made it into a popular, mainline programming language. Here I'm thinking primarily of sophisticated type systems and type inference. As long as this situation exists, people tempted to create languages have a clear void that their new modern statically typed language might possibly fill (at least in theory).
(And then they can mingle in some degree of immutability and this and that from other academic research. There's a lot of academic work on statically typed languages that hasn't gotten into popular languages yet. There's also a bunch of people who are grumpy about this lack of popularity, which is another crucial ingredient for creating new languages; see, for example, all of the people who are unhappy at Go for having such a simple and 'primitive' type system in the face of much more powerful ones being out there.)
While I'm not fully in touch with the dynamic language world either, my impression is that there is no similar reserve of well-established academic research on them that has yet to be reflected in popular dynamic languages. Without that, if you're going to create a new dynamic language today it's not clear how you're going to make it significantly different from the existing popular dynamic languages. And if you can't make it very different from the existing good options, what's the point? Certainly you're not likely to get much traction with a slightly different version of Python, JavaScript, or the like.
(Arguably you're not very likely to get significant traction for an advanced statically typed language if so many other ones before you have not been hits, but that's somewhat different in that hope springs eternal. It's the same impulse that keeps people writing new Lisp like languages that they want to be popular.)
PS: I could be totally wrong on this in that maybe there's a pile of good academic research on dynamic languages that's begging to be implemented and made popular. I'd actually like that; it'd mean we have the prospect of significantly better and nicer dynamic languages.
2015-11-21
What I think I want out of autocompletion in GNU Emacs (for Go coding)
I mentioned a while back that I had set up autocompletion in GNU Emacs for Go, using gocode and the auto-complete Emacs package. I also mentioned that I wasn't sure if I liked autocompletion and was going to stick with it. Well, the verdict is in for now; I found it too annoying and I wound up turning it off. However, I still kind of miss it. Thinking about what I miss and what made me hate it enough to turn it off has led me to what I think I want out of autocompletion.
Why I turned autocompletion off is that it kept stealing my keystrokes (in order to do the wrong autocompletion); cursor keys, the return key, and I think even sometimes the space bar. I type fast and I type ahead, so I absolutely, utterly hate having the sequence of what I'm typing be derailed because autocompletion decided to grab a cursor motion or a return or whatever. Unless I go out of my way, I want what I type at the keyboard to actually be what shows up in the file I'm editing. At the same time, the prompting and information that autocompletion gave me was genuinely useful; it was a great way to not have to remember the full names of things in Go packages and so on.
Given that I liked the information display, I don't want all of (auto)completion to be deferred until I use a special key sequence like C-M-i. If I spent a lot of time in GNU Emacs I might be able to train myself to hit that by reflex, but with my more casual use it'd just insure that I mostly never used completion at all. But I don't want any actual completing of things to happen until I hit a key to start it (and once I hit the key, it's fine if autocompletion steals my cursor keys and return key and so on).
So in short what I want from autocompletion is immediate information on possible completions coupled with deferred actual completion until I take some active step to start the completion process. This is fairly similar to the completion model I'm happy with in Unix shells, where nothing starts getting filled in until you hit TAB.
(Defering only actual completion doesn't appear to be possible in auto-complete. I can't entirely blame the package, because what I'm calling an information display is what it thinks of as a completion menu and completion prompt.)
Part of my irritation with autocompletion is specific to the Go
autocompletion mode provided by gocode. For instance, in Go I
don't want to have completion happen when I'm typing in language
keywords like package and func; I find it both distracting and
not useful. Completion is for things that I might have to look up;
if I'm typing a keyword, that is not the case.
(This completion of keywords is especially irritating because it's
blind to context. If I start typing 'pa' on a new line in a function
body, I'll still get offered 'package' as a possible completion
despite that clearly not being correct or even valid. Gocode is context
aware in general, in that it does things like offer local variables as
completions.)
PS: That part of my issues are with gocode itself suggests that even switching to vim wouldn't entirely help.
2015-11-15
I should remember that I can cast things in Go
I think of Go as a strongly typed language.
My broad and somewhat reflexive view of strongly typed languages is
that they mostly don't allow you to cast things around because most
casts will require expensive data conversion and the language wants
you to do that explicitly, with your own code. Go even sticks to this
view; you can cast numbers around (because it's too useful) and you
can go between string and []byte (because it's a core operation),
and that's mostly it.
(Then there's interfaces, which expose some tricks. Interface casting involves a bunch of potentially expensive magic, but it's a core feature of Go so it's an exception to the 'no expensive operations via casts' rule of thumb.)
However, there is an important practical exception to this, which
comes about because of another thing that Go encourages: lots of
named types that you derive from fundamental types. Rather than using,
say, int, for all sorts of different things in your code, everyone
knows that you should instead create various specific types:
type Phase int type Action int type OneMap map[string]string type TwoMap map[string]string
This way you can never accidentally use a Phase when the actual
function, field, or whatever is supposed to be an Action, or pass
a OneMap function a TwoMap, and so on. Go's strong typing will
force them to be separate (even if this is sometimes irritating,
for example if you're dealing with cgo).
These derived types can be cast to each other and to their underlying type. This is not just if they're numbers; any derived type can be cast around like this, provided that the underlying 'real' types are the same (per the Conversions section of the language spec).
(At a mechanical level it's easy to see why this is okay; since the two derived types have exactly the same memory layout, you don't have to do expensive conversion to generate a type-safe result.)
Now, ordinarily you still don't want to cast a OneMap to a TwoMap
(or to a map[string]string). But there is one special case that
matters to me, and that's if I want to do the same operation on
both sorts of maps. Since I actually can cast them around, I don't
need to write two duplicated blocks of (type-specific) code to do
the same operation. Instead I can write one, perhaps one that's
generic to the map[string]string type, and simply call it for
both cases through casts. This is not the only way to create common
code for a generic operation but it's probably the easiest one to
add on the fly without a bunch of code refactoring.
So this is why I need to remember that casting types, even complex types, is something that I can do in Go. It's been kind of a reflexive blind spot in my Go code in the past, but hopefully writing this will avoid it in the future.
2015-10-18
In Go, unsafe.Pointer is a built in type in the compiler
Here is something that I didn't fully grasp and understand until I did some digging in the Go compiler as part of writing a recent entry:
Despite being in a package,
unsafe.Pointeris really a built in type in the Go compiler, just likemap,chan,string, and so on.
While there is an unsafe package and it looks somewhat superficially
like reflect (which is a real package with real code), this is
an illusion. All parts of unsafe are implemented at compile time
inside the compiler and as part of this unsafe.Pointer is a built
in type, much like uintptr (or more to the point, something complex
like chan, map, or slices). One consequence of this is that
nothing involving an unsafe.Pointer (such as how it interacts
with escape analysis) can be understood or predicted by thinking
about it at the Go level through analogies to regular pointers or
the like. unsafe.Pointer is not just a little bit magical; it's
a lot magical.
(See eg TUNSAFEPTR in go.go.
The unsafe functions are interpreted at compile time in unsafe.go,
which turns all of them into literals of type uintptr.)
I imagine a large reason that unsafe.Pointer is not simply Pointer
is so that people are forced to explicitly import unsafe in order
to use it. This probably both avoids some casual use and makes it
easier to find (or check for) potential dangerous things; all you
have to do is scan things looking for imports of unsafe.
(There's also that 'go tool compile' accepts a -u argument,
which disables use of unsafe as part of its effects.)
(Perhaps this has been obvious to other people, but it wasn't to
me, especially given that reflect seems to involve at most a tiny
little bit of special compiler magic; in Go 1.5, it's a lot of real
Go code in a real package.)
2015-10-16
Inside a Go 'terrible hack' in the reflect package
Recently John Allspaw tweeted about some 'terrible hack' comments in the source code of various important projects. One of them was Go, which made me curious about the context. Although I can't exactly match Allspaw's version of the comment, I think it's this comment in src/reflect/value.go, which I'm going to quote in full:
func ValueOf(i interface{}) Value {
[...]
// TODO(rsc): Eliminate this terrible hack.
// In the call to unpackEface, i.typ doesn't escape,
// and i.word is an integer. So it looks like
// i doesn't escape. But really it does,
// because i.word is actually a pointer.
escapes(i)
return unpackEface(i)
}
In a nutshell, what's going on here is that the compiler is being too smart about escape analysis and the 'hack' code here is defeating that smartness in order to avoid memory errors.
Per Russ Cox's explanation on how interfaces are implemented, an interface{} value is
represented in memory as essentially two pointers, one to the
underlying type and one to the actual value. unpackEface() magically
turns this into a reflect.Value, which has exactly this
information (plus some internal stuff). Unfortunately it does so
in a way that causes the compiler's escape analysis to think that
nothing from the 'i' argument outlives ('escapes') unpackEface(),
which would normally mean that the compiler thinks 'i' doesn't
outlive ValueOf() either.
So let's imagine that you write:
type astruct struct { ... }func toval() reflect.Value { var danger astruct return reflect.ValueOf(&danger) }
Without the hack, escape analysis could tell the Go compiler that
&danger doesn't escape reflect.ValueOf(), which would make
danger safe to allocate on the stack, where it would get (implicitly)
destroyed when toval() returns. Unfortunately the Value returned
by toval() actually refers to this now-destroyed stack memory.
Whoops. By explicitly defeating escape analysis, ValueOf() forces
danger to be allocated in the heap where it will outlive toval()
and thus avoid this bug.
(You might wonder if Go garbage collection has similar problems and
the answer is apparently 'no', although the details are well beyond
both me and the scope of this entry. See this golang-nuts thread
on garbage collection and unsafe.Pointer.)
A Go compiler that was less smart about escape analysis wouldn't have this problem; as you can see, the compiler has to reason through several layers of code to go wrong. But escape analysis is an important optimization for a language like Go so the compiler has clearly worked hard at it.
(If the Go compiler is doing not just cross function but cross package escape analysis (which it certainly looks like), I have to say that I'm impressed by how thorough it's being.)
Sidebar: How escapes() works
Before I looked at it, I expected escapes() to involve some deep
magic to tell the compiler to go away. The reality
is more prosaic (and humbling for me in my flights of imagination):
var dummy struct {
b bool
x interface{}
}
func escapes(x interface{}) {
if dummy.b {
dummy.x = x
}
}
In theory a sufficiently smart compiler could detect that dummy
is not exported from reflect and is not touched inside it, so
dummy.b is always false and escapes() always does nothing and
so x does not escape it. In practice I suspect that the Go
compiler will never get that perversely smart for various reasons.
2015-10-13
Why I've come to really like git
I mentioned on Twitter that I've completely flipped my view of git (and Mercurial) around, to the point where I actively like using git and want to be using git for things. This goes beyond where I used to be, where I simply felt that git was the right thing to learn and use going forward for relatively pragmatic reasons. Part of this is undoubtedly just increasing familiarity with git (it's hard not to feel good about something you use a lot), but I can also point to specific things that I really like about git.
The first is git rebase and everything around it, both when dealing
with other people's projects and when
working on my own stuff. Rebasing (and
things it enables) has clearly made my life
easier and nicer, especially when carrying my own modifications on
top of other people's software. Interactive rebasing also makes it
easy to manipulate my commits in general to do things like shuffle
the order or squash several commits together. Perhaps there are
other tools for this, but I already need to know rebasing, so I've
opted to keep my life simple.
The second turned out to be git's index. There are two primary
things I love about the index: it lets me see definitely and exactly
what I'll be committing before I do it via 'git diff --cached'
(which I use so much I have an alias for it), and it easily lets
me make and check selective commits. Of course in theory I shouldn't
need to make selective commits because I should be working selectively
to start with. In practices, no, I don't naturally wind up working
that way so it's great to be able to methodically untangle the
resulting messy work tree into a series of neat commits by staging
things through the index.
(That the index is explicit is very important here for me, because
it means I can stage things into the index, see the diff, and then
say 'oops, no, I left something out' or 'oops, no, I put too much
in, let's try that again'. An on the fly single pass 'select and
commit' model demands that I get it right the first time with little
margin for error. With the index I can abort right up to the point
where I complete 'git commit' and I haven't lost any of my prep
work so far.)
The third thing is 'git grep', or more specifically the fact that
the git people have somehow made it amazingly fast. 'git grep'
is clearly much faster at searching repos (especially big repos)
for things than normal 'grep -r', 'find | grep', and so on.
Since a bunch of what I do with other people's repos is fish through
them trying to find things, this is really great for me; I can search
the Linux kernel repo, the Illumos repo, and so on fast enough to make
it a casual thing to do. By contrast, finding things in the Mozilla
Mercurial repo is always a comparatively slow pain.
(Mercurial has 'hg grep', but it does something completely
different. What it does is useful but something that I want much
less often.)
Although I can't point to anything big in specific, in general I've
wound up feeling that git makes it easier (and possible) to manipulate
my repos in crazy ways if I really need to. I suppose 'git
filter-branch' is the poster child for this (although the feature
I wound up caring about has been mostly wrapped up as 'git subtree
split'), but I've also used things like changing the upstream
of branches. Basically it feels like if
git can possibly support something, it will somehow and I can make
things work.
(I may discover additional nice things about git in the future, but this is my current list of things that really affect me when I work with a git repo versus eg a Mercurial repo.)
2015-09-27
Wikitext not having errors creates a backtracking problem
In the past I've written some pain points of parsing wikitext and called out how there aren't conventional parsing errors in running wikitext, just things that turn out to be plain text instead of complete wikitext markup. Some of the consequences of this may not be obvious, and in fact they weren't obvious to me until I tried to make an overly ambitious change to DWiki's markup to HTML conversion code.
The obvious problem that 'no errors' creates is that you will have to either accept closing off incomplete markup or do lookahead to verify that you seem to have a complete entity, or both. If your markup denotes links as '[[ ... ]]', you probably want to look ahead for a ']]' before you start processing a '[[' as a link. Unfortunately doing lookahead correctly is quite hard if your wikitext permits various sorts of nested constructs. Consider DWikiText, which also has '(( ... ))' to quote uninterpreted text in a monospaced font, and then parsing the following:
This example [[looks like ((it has an ending ]])) but it doesn't.
Purely textual lookahead for a ']]' gets fooled here. So let's assume we're going to get fooled sooner or later and handle this better. Rather than trying to rely on fallible lookahead, if we reach the end of a paragraph with an unclosed entity we'll go back to the start of the entity and turn it into plain text.
Unfortunately this has problems too, because something retroactively becoming plain text may change the meaning of other text after that point. Consider this contrived example:
Lorem ipsum ((this *should be emphasis* because the '((' isn't closed and thus is plain text.
If you start out parsing the (( as real, the *'s are plain text. But once the (( is just plain text, they should be creating italics for emphasis. To really retroactively change the (( to plain text, you may need to backtrack all text processing since then and redo it. And backtracking is something conventional parsing technology is generally not designed for; in fact, conventional parsing technology usually avoids it like the plague (along with aggressive lookahead).
(I think the lookahead situation gets somewhat better if you look ahead in the token stream instead of in plain text, but it's still not great. You're basically parsing ahead of your actual parse, and you'd better keep both in sync. Backtracking your actual parsing is probably better.)
All of this has caused me to feel that parsing running wikitext in a single pass is not the best way to do it. Instead I have a multi-pass approach in mind (and have for some time), although I'm not entirely convinced it's right either. I probably won't know unless (and until) I actually implement it, which is probably unlikely.
(An alternate approach would be to simply have backtracking in a conventional recursive descent parser; every time you hit a 'parse error', the appropriate construct being parsed would turn its start token into plain text and continue the parsing from there. Unfortunately this feels like it could be vulnerable to pathological behavior, which is a potential issue for a parser that may be handed user-controlled input in the form of eg comments.)
PS: How I stubbed my toe on this issue was basically trying to do this sort of 'convert back to plain text' for general unclosed font changes in DWikiText. When I did this outside of a limited context, it blew up in my face.
2015-09-22
The Go 'rolling errors' pattern, in function call form
One of the small annoyances of Go's explicit error returns is that the basic approach of checking error returns at every step is annoying when all the error handling is actually the same. You wind up with the classic annoying pattern of, say:
s.f1, err = strconv.ParseUint(fields[1], 10, 64)
if err != nil {
return nil, err
}
s.f2, err = strconv.ParseUint(fields[2], 10, 64)
if err != nil {
return nil, err
}
[... repeat ...]
Of course, any good lazy programmer who is put into this starting situation is going to come up with a way to aggregate that error handling together. Go programmers are no exception, which has led to what I'll call a generic 'rolling errors' set of patterns. The basic pattern, as laid out in Rob Pike's Go blog entry Errors are values, is that as you do a sequence of operations you keep an internal marker of whether errors have occurred; at the end of processing, you check it and handle any error then.
Rob Pike's examples all use auxiliary storage for this internal marker (in one example, in a closure). I'm a lazy person so I tend to externalize this auxiliary storage as an extra function argument, which makes the whole thing look like this:
func getInt(field string, e error) (uint64, error) {
i, err := strconv.ParseUint(field, 10, 64)
if err != nil {
return i, err
}
return i, e
}
func .... {
[...]
var err error
s.f1, err = getInt(fields[1], err)
s.f2, err = getInt(fields[2], err)
s.f3, err = getInt(fields[3], err)
if err != nil {
return nil, err
}
[...]
}
This example code does bring up something you may want to think about in 'rolling errors' handling, which is what operations you want to do once you hit an error and which error you want to return. Sometimes the answer is clearly 'stop doing operations and return the first error'; other times, as with this code, you may decide that any of the errors is okay to return and it's simpler if the code keeps on doing operations (it may even be better).
(In retrospect I could have made this code just as simple while still stopping on the first error, but it didn't occur to me when I put this into a real program. In this case these error conditions are never expected to happen, since I'm parsing what should be numeric fields that are in a system generated file.)
As an obvious corollary, this 'rolling errors' pattern doesn't
require using error itself. You can use it with any running or
accumulated status indicator, including a simple boolean.
(Sometimes you don't need the entire infrastructure of error to
signal problems. If this seems crazy, consider the case of subtracting
two accumulating counters from each other to get a delta over a
time interval where a counter might roll over and make this delta
invalid. You generally don't need details or an error message here,
you just want to know if the counter rolled over or not and thus
whether or not you want to disregard this delta.)
2015-09-14
A caution about cgo's error returns for errno
Go's cgo system for calling C functions offers a very convenient feature. As the documentation puts it:
Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error [...]
Reading this, you may be tempted to write more or less standard Go error-handling code like the following:
kcid, err := C.kstat_chain_update(t.kc)
if err != nil {
return err
}
This code is a potential mistake. Unless the documentation for the
C function you're calling says so explicitly, there is no guarantee
that errno is zero on success. If the function returns success
but errno is non-zero, cgo will dutifully generate a non-nil
error return from it and then your Go code will bail out with an
error that isn't.
This is not cgo's fault. Cgo has no magic knowledge of what C
function return values are and aren't errors, so all it can do is
exactly what it said it was going to do; if errno is non zero,
you get an error version of it. This is just a C API issue (that
ultimately comes about because errno is both an implicit return
and global state). You'd never write code like this in Go, where
'only return non-nil error on actual errors' is well established,
but we're stuck with the C API that we actually have instead of the
Go-like one we'd like. So we have to deal with it, which means
checking return values explicitly.
(In this case the real 'there has been an error' marker is a kcid
return value of -1. I actually hit an irregular test failure when
my code was just checking err, which is how I re-stubbed my toe
on this particular C API issue.)
PS: the ultimate cause of this is that C code often doesn't explicitly
set errno to zero on success but instead leaves it alone, which
means errno can wind up set from whatever internal system call or
library routine last failed and set it. There are many possibilities
for how this can happen; a classical one is seeing ENOTTY from
something checking to see if the file descriptor it is writing to
is a TTY and so should be in line-buffered mode.
(In my case I saw EAGAIN, which I believe was the OmniOS kernel telling kstat_chain_update()
that the buffer it had been given wasn't large enough, please try
again with a bigger one.)
2015-09-09
Some notes on my experience using Go's cgo system
Cgo is the Go compiler suite's bridge to C. I recently used it to write a Go package that gives you access to Solaris kstat kernel statistics, so I want to write down some notes on the whole thing before they fall out of my memory. On the whole, using cgo was a pleasant enough experience. I don't have very much experience in FFIs so I can't say how cgo compares to others, but cgo makes C and Go seem like a reasonably natural fit. It often really does seem like you're just using another Go package.
(With that said, there's a lot more use of unsafe.Pointer() than
you'll find almost anywhere else.)
At the mechanical level, the most annoying thing to deal with was
C unions. Go has no equivalent and cgo basically leaves you on your
own to read or set union fields. I wound up just writing some trivial
C functions to extract union fields for me and then had my Go code
call them, rather than wrestle with casts and unsafe.Pointer()
and so on in Go code; the C functions were both short and less error
prone for me to write.
A C function was also my solution to needing to do pointer arithmetic.
In C, a common approach is to define a field as 'struct whatever
*ptr;' and then say it actually points to an array of those
structs, with the length of the array given by some other field.
You access the elements of the array by doing things like incrementing
ptr or indexing off it. Well, in Go that doesn't work; if you
want to increment ptr to the next struct, you're going to have
to throw in explicit C.sizeof invocations and so on. I decided
it was simpler to do it in C instead:
kstat_named_t *get_nth_named(kstat_t *ks, uint_t n) {
kstat_named_t *knp;
if (!ks || !ks->ks_data || n >= ks->ks_ndata)
return NULL;
knp = (kstat_named_t *)ks->ks_data;
return knp + n;
}
Typecasts are another one of the irritations of cgo. Cgo makes every
C type into a Go type, and boy does a lot of C turn out to have a
lot of different integer types. In C they mostly convert into each
other without explicit casts; in Go they are all fully separate
types and you must explicitly cast them around in order to interact
with each other and with native Go integer types. This can get
especially annoying in things like for loop indexing, because if
you write 'for i := 0; i < CFIELD; i ++' the compiler will object
that i is a different type than CFIELD. This resulted in a for
loop that looks like this:
for i := C.uint_t(0); i < k.ksp.ks_ndata; i++ {
....
}
(I wrote more about the mechanics in getting C-compatible structs
in Go, copying memory into Go structs, and cgo's string functions explained.)
At the design level, my biggest problem was handling C memory lifetime issues correctly. Part of this was figuring out where the C library had to be using dynamic allocation (and when it got freed), and part of it was working out what it was safe for Go structures to hold references to and when those references might become invalid because of some call I made to the C library API. Working this out is vital because of the impact of coupling Go and C memory lifetimes together, plus these memory lifetime issues are likely to have an effect on your package API. What operations can callers do or not do after others? What precautions do you need to take inside your package to try to avoid dereferencing now-free C memory if callers get the lifetime rules wrong? What things can you not expose because there's no way to guard against 'use after free' errors? And so on.
(runtime.SetFinalizer()
can help with this by letting you clean up C memory when Go memory
is going away, but it's not a complete cure.)
Not all uses of cgo will run into memory lifetime problems. Many are probably self-contained, where all of your interaction with C code is inside one function and when it returns you're done and can free up everything.