Wandering Thoughts archives

2015-11-27

A thought on the apparent popularity of new static typing languages

It all started with Elben Shira's The End of Dynamic Languages, which sparked Have Static Languages Won? (via) from Maxime Chevalier-Boisvert. In that I read the following:

Like [Elben Shira], I've noticed that despite the fact that there have been an enormous number of new programming languages coming out recently, the overwhelming majority of them are statically typed. [...]

I'm an outsider bystander on all of this, but it strikes me that one possible contributing reason for lots of people creating new statically typed languages is that there is a significant body of academic research that has not yet made it into a popular, mainline programming language. Here I'm thinking primarily of sophisticated type systems and type inference. As long as this situation exists, people tempted to create languages have a clear void that their new modern statically typed language might possibly fill (at least in theory).

(And then they can mingle in some degree of immutability and this and that from other academic research. There's a lot of academic work on statically typed languages that hasn't gotten into popular languages yet. There's also a bunch of people who are grumpy about this lack of popularity, which is another crucial ingredient for creating new languages; see, for example, all of the people who are unhappy at Go for having such a simple and 'primitive' type system in the face of much more powerful ones being out there.)

While I'm not fully in touch with the dynamic language world either, my impression is that there is no similar reserve of well-established academic research on them that has yet to be reflected in popular dynamic languages. Without that, if you're going to create a new dynamic language today it's not clear how you're going to make it significantly different from the existing popular dynamic languages. And if you can't make it very different from the existing good options, what's the point? Certainly you're not likely to get much traction with a slightly different version of Python, JavaScript, or the like.

(Arguably you're not very likely to get significant traction for an advanced statically typed language if so many other ones before you have not been hits, but that's somewhat different in that hope springs eternal. It's the same impulse that keeps people writing new Lisp like languages that they want to be popular.)

PS: I could be totally wrong on this in that maybe there's a pile of good academic research on dynamic languages that's begging to be implemented and made popular. I'd actually like that; it'd mean we have the prospect of significantly better and nicer dynamic languages.

NewStaticLanguagePopularity written at 01:55:37; Add Comment

2015-11-21

What I think I want out of autocompletion in GNU Emacs (for Go coding)

I mentioned a while back that I had set up autocompletion in GNU Emacs for Go, using gocode and the auto-complete Emacs package. I also mentioned that I wasn't sure if I liked autocompletion and was going to stick with it. Well, the verdict is in for now; I found it too annoying and I wound up turning it off. However, I still kind of miss it. Thinking about what I miss and what made me hate it enough to turn it off has led me to what I think I want out of autocompletion.

Why I turned autocompletion off is that it kept stealing my keystrokes (in order to do the wrong autocompletion); cursor keys, the return key, and I think even sometimes the space bar. I type fast and I type ahead, so I absolutely, utterly hate having the sequence of what I'm typing be derailed because autocompletion decided to grab a cursor motion or a return or whatever. Unless I go out of my way, I want what I type at the keyboard to actually be what shows up in the file I'm editing. At the same time, the prompting and information that autocompletion gave me was genuinely useful; it was a great way to not have to remember the full names of things in Go packages and so on.

Given that I liked the information display, I don't want all of (auto)completion to be deferred until I use a special key sequence like C-M-i. If I spent a lot of time in GNU Emacs I might be able to train myself to hit that by reflex, but with my more casual use it'd just insure that I mostly never used completion at all. But I don't want any actual completing of things to happen until I hit a key to start it (and once I hit the key, it's fine if autocompletion steals my cursor keys and return key and so on).

So in short what I want from autocompletion is immediate information on possible completions coupled with deferred actual completion until I take some active step to start the completion process. This is fairly similar to the completion model I'm happy with in Unix shells, where nothing starts getting filled in until you hit TAB.

(Defering only actual completion doesn't appear to be possible in auto-complete. I can't entirely blame the package, because what I'm calling an information display is what it thinks of as a completion menu and completion prompt.)

Part of my irritation with autocompletion is specific to the Go autocompletion mode provided by gocode. For instance, in Go I don't want to have completion happen when I'm typing in language keywords like package and func; I find it both distracting and not useful. Completion is for things that I might have to look up; if I'm typing a keyword, that is not the case.

(This completion of keywords is especially irritating because it's blind to context. If I start typing 'pa' on a new line in a function body, I'll still get offered 'package' as a possible completion despite that clearly not being correct or even valid. Gocode is context aware in general, in that it does things like offer local variables as completions.)

PS: That part of my issues are with gocode itself suggests that even switching to vim wouldn't entirely help.

EmacsAutocompletionWant written at 03:25:34; Add Comment

2015-11-15

I should remember that I can cast things in Go

I think of Go as a strongly typed language. My broad and somewhat reflexive view of strongly typed languages is that they mostly don't allow you to cast things around because most casts will require expensive data conversion and the language wants you to do that explicitly, with your own code. Go even sticks to this view; you can cast numbers around (because it's too useful) and you can go between string and []byte (because it's a core operation), and that's mostly it.

(Then there's interfaces, which expose some tricks. Interface casting involves a bunch of potentially expensive magic, but it's a core feature of Go so it's an exception to the 'no expensive operations via casts' rule of thumb.)

However, there is an important practical exception to this, which comes about because of another thing that Go encourages: lots of named types that you derive from fundamental types. Rather than using, say, int, for all sorts of different things in your code, everyone knows that you should instead create various specific types:

type Phase int
type Action int

type OneMap map[string]string
type TwoMap map[string]string

This way you can never accidentally use a Phase when the actual function, field, or whatever is supposed to be an Action, or pass a OneMap function a TwoMap, and so on. Go's strong typing will force them to be separate (even if this is sometimes irritating, for example if you're dealing with cgo).

These derived types can be cast to each other and to their underlying type. This is not just if they're numbers; any derived type can be cast around like this, provided that the underlying 'real' types are the same (per the Conversions section of the language spec).

(At a mechanical level it's easy to see why this is okay; since the two derived types have exactly the same memory layout, you don't have to do expensive conversion to generate a type-safe result.)

Now, ordinarily you still don't want to cast a OneMap to a TwoMap (or to a map[string]string). But there is one special case that matters to me, and that's if I want to do the same operation on both sorts of maps. Since I actually can cast them around, I don't need to write two duplicated blocks of (type-specific) code to do the same operation. Instead I can write one, perhaps one that's generic to the map[string]string type, and simply call it for both cases through casts. This is not the only way to create common code for a generic operation but it's probably the easiest one to add on the fly without a bunch of code refactoring.

So this is why I need to remember that casting types, even complex types, is something that I can do in Go. It's been kind of a reflexive blind spot in my Go code in the past, but hopefully writing this will avoid it in the future.

GoHasCasts written at 01:41:01; Add Comment

2015-10-18

In Go, unsafe.Pointer is a built in type in the compiler

Here is something that I didn't fully grasp and understand until I did some digging in the Go compiler as part of writing a recent entry:

Despite being in a package, unsafe.Pointer is really a built in type in the Go compiler, just like map, chan, string, and so on.

While there is an unsafe package and it looks somewhat superficially like reflect (which is a real package with real code), this is an illusion. All parts of unsafe are implemented at compile time inside the compiler and as part of this unsafe.Pointer is a built in type, much like uintptr (or more to the point, something complex like chan, map, or slices). One consequence of this is that nothing involving an unsafe.Pointer (such as how it interacts with escape analysis) can be understood or predicted by thinking about it at the Go level through analogies to regular pointers or the like. unsafe.Pointer is not just a little bit magical; it's a lot magical.

(See eg TUNSAFEPTR in go.go. The unsafe functions are interpreted at compile time in unsafe.go, which turns all of them into literals of type uintptr.)

I imagine a large reason that unsafe.Pointer is not simply Pointer is so that people are forced to explicitly import unsafe in order to use it. This probably both avoids some casual use and makes it easier to find (or check for) potential dangerous things; all you have to do is scan things looking for imports of unsafe.

(There's also that 'go tool compile' accepts a -u argument, which disables use of unsafe as part of its effects.)

(Perhaps this has been obvious to other people, but it wasn't to me, especially given that reflect seems to involve at most a tiny little bit of special compiler magic; in Go 1.5, it's a lot of real Go code in a real package.)

GoUnsafePointerBuiltin written at 02:19:07; Add Comment

2015-10-16

Inside a Go 'terrible hack' in the reflect package

Recently John Allspaw tweeted about some 'terrible hack' comments in the source code of various important projects. One of them was Go, which made me curious about the context. Although I can't exactly match Allspaw's version of the comment, I think it's this comment in src/reflect/value.go, which I'm going to quote in full:

func ValueOf(i interface{}) Value {
    [...]
    // TODO(rsc): Eliminate this terrible hack.
    // In the call to unpackEface, i.typ doesn't escape,
    // and i.word is an integer.  So it looks like
    // i doesn't escape.  But really it does,
    // because i.word is actually a pointer.
    escapes(i)

    return unpackEface(i)
}

In a nutshell, what's going on here is that the compiler is being too smart about escape analysis and the 'hack' code here is defeating that smartness in order to avoid memory errors.

Per Russ Cox's explanation on how interfaces are implemented, an interface{} value is represented in memory as essentially two pointers, one to the underlying type and one to the actual value. unpackEface() magically turns this into a reflect.Value, which has exactly this information (plus some internal stuff). Unfortunately it does so in a way that causes the compiler's escape analysis to think that nothing from the 'i' argument outlives ('escapes') unpackEface(), which would normally mean that the compiler thinks 'i' doesn't outlive ValueOf() either.

So let's imagine that you write:

type astruct struct { ... }
func toval() reflect.Value {
   var danger astruct
   return reflect.ValueOf(&danger)
}

Without the hack, escape analysis could tell the Go compiler that &danger doesn't escape reflect.ValueOf(), which would make danger safe to allocate on the stack, where it would get (implicitly) destroyed when toval() returns. Unfortunately the Value returned by toval() actually refers to this now-destroyed stack memory. Whoops. By explicitly defeating escape analysis, ValueOf() forces danger to be allocated in the heap where it will outlive toval() and thus avoid this bug.

(You might wonder if Go garbage collection has similar problems and the answer is apparently 'no', although the details are well beyond both me and the scope of this entry. See this golang-nuts thread on garbage collection and unsafe.Pointer.)

A Go compiler that was less smart about escape analysis wouldn't have this problem; as you can see, the compiler has to reason through several layers of code to go wrong. But escape analysis is an important optimization for a language like Go so the compiler has clearly worked hard at it.

(If the Go compiler is doing not just cross function but cross package escape analysis (which it certainly looks like), I have to say that I'm impressed by how thorough it's being.)

Sidebar: How escapes() works

Before I looked at it, I expected escapes() to involve some deep magic to tell the compiler to go away. The reality is more prosaic (and humbling for me in my flights of imagination):

var dummy struct {
    b bool
    x interface{}
}

func escapes(x interface{}) {
    if dummy.b {
       dummy.x = x
    }
}

In theory a sufficiently smart compiler could detect that dummy is not exported from reflect and is not touched inside it, so dummy.b is always false and escapes() always does nothing and so x does not escape it. In practice I suspect that the Go compiler will never get that perversely smart for various reasons.

GoReflectEscapeHack written at 02:54:40; Add Comment

2015-10-13

Why I've come to really like git

I mentioned on Twitter that I've completely flipped my view of git (and Mercurial) around, to the point where I actively like using git and want to be using git for things. This goes beyond where I used to be, where I simply felt that git was the right thing to learn and use going forward for relatively pragmatic reasons. Part of this is undoubtedly just increasing familiarity with git (it's hard not to feel good about something you use a lot), but I can also point to specific things that I really like about git.

The first is git rebase and everything around it, both when dealing with other people's projects and when working on my own stuff. Rebasing (and things it enables) has clearly made my life easier and nicer, especially when carrying my own modifications on top of other people's software. Interactive rebasing also makes it easy to manipulate my commits in general to do things like shuffle the order or squash several commits together. Perhaps there are other tools for this, but I already need to know rebasing, so I've opted to keep my life simple.

The second turned out to be git's index. There are two primary things I love about the index: it lets me see definitely and exactly what I'll be committing before I do it via 'git diff --cached' (which I use so much I have an alias for it), and it easily lets me make and check selective commits. Of course in theory I shouldn't need to make selective commits because I should be working selectively to start with. In practices, no, I don't naturally wind up working that way so it's great to be able to methodically untangle the resulting messy work tree into a series of neat commits by staging things through the index.

(That the index is explicit is very important here for me, because it means I can stage things into the index, see the diff, and then say 'oops, no, I left something out' or 'oops, no, I put too much in, let's try that again'. An on the fly single pass 'select and commit' model demands that I get it right the first time with little margin for error. With the index I can abort right up to the point where I complete 'git commit' and I haven't lost any of my prep work so far.)

The third thing is 'git grep', or more specifically the fact that the git people have somehow made it amazingly fast. 'git grep' is clearly much faster at searching repos (especially big repos) for things than normal 'grep -r', 'find | grep', and so on. Since a bunch of what I do with other people's repos is fish through them trying to find things, this is really great for me; I can search the Linux kernel repo, the Illumos repo, and so on fast enough to make it a casual thing to do. By contrast, finding things in the Mozilla Mercurial repo is always a comparatively slow pain.

(Mercurial has 'hg grep', but it does something completely different. What it does is useful but something that I want much less often.)

Although I can't point to anything big in specific, in general I've wound up feeling that git makes it easier (and possible) to manipulate my repos in crazy ways if I really need to. I suppose 'git filter-branch' is the poster child for this (although the feature I wound up caring about has been mostly wrapped up as 'git subtree split'), but I've also used things like changing the upstream of branches. Basically it feels like if git can possibly support something, it will somehow and I can make things work.

(I may discover additional nice things about git in the future, but this is my current list of things that really affect me when I work with a git repo versus eg a Mercurial repo.)

WhyILikeGit written at 01:42:22; Add Comment

2015-09-27

Wikitext not having errors creates a backtracking problem

In the past I've written some pain points of parsing wikitext and called out how there aren't conventional parsing errors in running wikitext, just things that turn out to be plain text instead of complete wikitext markup. Some of the consequences of this may not be obvious, and in fact they weren't obvious to me until I tried to make an overly ambitious change to DWiki's markup to HTML conversion code.

The obvious problem that 'no errors' creates is that you will have to either accept closing off incomplete markup or do lookahead to verify that you seem to have a complete entity, or both. If your markup denotes links as '[[ ... ]]', you probably want to look ahead for a ']]' before you start processing a '[[' as a link. Unfortunately doing lookahead correctly is quite hard if your wikitext permits various sorts of nested constructs. Consider DWikiText, which also has '(( ... ))' to quote uninterpreted text in a monospaced font, and then parsing the following:

This example [[looks like ((it has an ending ]])) but it doesn't.

Purely textual lookahead for a ']]' gets fooled here. So let's assume we're going to get fooled sooner or later and handle this better. Rather than trying to rely on fallible lookahead, if we reach the end of a paragraph with an unclosed entity we'll go back to the start of the entity and turn it into plain text.

Unfortunately this has problems too, because something retroactively becoming plain text may change the meaning of other text after that point. Consider this contrived example:

Lorem ipsum ((this *should be emphasis* because the '((' isn't closed and thus is plain text.

If you start out parsing the (( as real, the *'s are plain text. But once the (( is just plain text, they should be creating italics for emphasis. To really retroactively change the (( to plain text, you may need to backtrack all text processing since then and redo it. And backtracking is something conventional parsing technology is generally not designed for; in fact, conventional parsing technology usually avoids it like the plague (along with aggressive lookahead).

(I think the lookahead situation gets somewhat better if you look ahead in the token stream instead of in plain text, but it's still not great. You're basically parsing ahead of your actual parse, and you'd better keep both in sync. Backtracking your actual parsing is probably better.)

All of this has caused me to feel that parsing running wikitext in a single pass is not the best way to do it. Instead I have a multi-pass approach in mind (and have for some time), although I'm not entirely convinced it's right either. I probably won't know unless (and until) I actually implement it, which is probably unlikely.

(An alternate approach would be to simply have backtracking in a conventional recursive descent parser; every time you hit a 'parse error', the appropriate construct being parsed would turn its start token into plain text and continue the parsing from there. Unfortunately this feels like it could be vulnerable to pathological behavior, which is a potential issue for a parser that may be handed user-controlled input in the form of eg comments.)

PS: How I stubbed my toe on this issue was basically trying to do this sort of 'convert back to plain text' for general unclosed font changes in DWikiText. When I did this outside of a limited context, it blew up in my face.

WikitextNoErrorsBacktracking written at 02:10:30; Add Comment

2015-09-22

The Go 'rolling errors' pattern, in function call form

One of the small annoyances of Go's explicit error returns is that the basic approach of checking error returns at every step is annoying when all the error handling is actually the same. You wind up with the classic annoying pattern of, say:

s.f1, err = strconv.ParseUint(fields[1], 10, 64)
if err != nil {
   return nil, err
}
s.f2, err = strconv.ParseUint(fields[2], 10, 64)
if err != nil {
   return nil, err
}
[... repeat ...]

Of course, any good lazy programmer who is put into this starting situation is going to come up with a way to aggregate that error handling together. Go programmers are no exception, which has led to what I'll call a generic 'rolling errors' set of patterns. The basic pattern, as laid out in Rob Pike's Go blog entry Errors are values, is that as you do a sequence of operations you keep an internal marker of whether errors have occurred; at the end of processing, you check it and handle any error then.

Rob Pike's examples all use auxiliary storage for this internal marker (in one example, in a closure). I'm a lazy person so I tend to externalize this auxiliary storage as an extra function argument, which makes the whole thing look like this:

func getInt(field string, e error) (uint64, error) {
   i, err := strconv.ParseUint(field, 10, 64)
   if err != nil {
      return i, err
   }
   return i, e
}

func .... {
   [...]

   var err error
   s.f1, err = getInt(fields[1], err)
   s.f2, err = getInt(fields[2], err)
   s.f3, err = getInt(fields[3], err)

   if err != nil {
      return nil, err
   }
   [...]
}

This example code does bring up something you may want to think about in 'rolling errors' handling, which is what operations you want to do once you hit an error and which error you want to return. Sometimes the answer is clearly 'stop doing operations and return the first error'; other times, as with this code, you may decide that any of the errors is okay to return and it's simpler if the code keeps on doing operations (it may even be better).

(In retrospect I could have made this code just as simple while still stopping on the first error, but it didn't occur to me when I put this into a real program. In this case these error conditions are never expected to happen, since I'm parsing what should be numeric fields that are in a system generated file.)

As an obvious corollary, this 'rolling errors' pattern doesn't require using error itself. You can use it with any running or accumulated status indicator, including a simple boolean.

(Sometimes you don't need the entire infrastructure of error to signal problems. If this seems crazy, consider the case of subtracting two accumulating counters from each other to get a delta over a time interval where a counter might roll over and make this delta invalid. You generally don't need details or an error message here, you just want to know if the counter rolled over or not and thus whether or not you want to disregard this delta.)

GoRollingErrors written at 00:23:46; Add Comment

2015-09-14

A caution about cgo's error returns for errno

Go's cgo system for calling C functions offers a very convenient feature. As the documentation puts it:

Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error [...]

Reading this, you may be tempted to write more or less standard Go error-handling code like the following:

kcid, err := C.kstat_chain_update(t.kc)
if err != nil {
   return err
}

This code is a potential mistake. Unless the documentation for the C function you're calling says so explicitly, there is no guarantee that errno is zero on success. If the function returns success but errno is non-zero, cgo will dutifully generate a non-nil error return from it and then your Go code will bail out with an error that isn't.

This is not cgo's fault. Cgo has no magic knowledge of what C function return values are and aren't errors, so all it can do is exactly what it said it was going to do; if errno is non zero, you get an error version of it. This is just a C API issue (that ultimately comes about because errno is both an implicit return and global state). You'd never write code like this in Go, where 'only return non-nil error on actual errors' is well established, but we're stuck with the C API that we actually have instead of the Go-like one we'd like. So we have to deal with it, which means checking return values explicitly.

(In this case the real 'there has been an error' marker is a kcid return value of -1. I actually hit an irregular test failure when my code was just checking err, which is how I re-stubbed my toe on this particular C API issue.)

PS: the ultimate cause of this is that C code often doesn't explicitly set errno to zero on success but instead leaves it alone, which means errno can wind up set from whatever internal system call or library routine last failed and set it. There are many possibilities for how this can happen; a classical one is seeing ENOTTY from something checking to see if the file descriptor it is writing to is a TTY and so should be in line-buffered mode.

(In my case I saw EAGAIN, which I believe was the OmniOS kernel telling kstat_chain_update() that the buffer it had been given wasn't large enough, please try again with a bigger one.)

GoCgoErrorReturns written at 23:45:13; Add Comment

2015-09-09

Some notes on my experience using Go's cgo system

Cgo is the Go compiler suite's bridge to C. I recently used it to write a Go package that gives you access to Solaris kstat kernel statistics, so I want to write down some notes on the whole thing before they fall out of my memory. On the whole, using cgo was a pleasant enough experience. I don't have very much experience in FFIs so I can't say how cgo compares to others, but cgo makes C and Go seem like a reasonably natural fit. It often really does seem like you're just using another Go package.

(With that said, there's a lot more use of unsafe.Pointer() than you'll find almost anywhere else.)

At the mechanical level, the most annoying thing to deal with was C unions. Go has no equivalent and cgo basically leaves you on your own to read or set union fields. I wound up just writing some trivial C functions to extract union fields for me and then had my Go code call them, rather than wrestle with casts and unsafe.Pointer() and so on in Go code; the C functions were both short and less error prone for me to write.

A C function was also my solution to needing to do pointer arithmetic. In C, a common approach is to define a field as 'struct whatever *ptr;' and then say it actually points to an array of those structs, with the length of the array given by some other field. You access the elements of the array by doing things like incrementing ptr or indexing off it. Well, in Go that doesn't work; if you want to increment ptr to the next struct, you're going to have to throw in explicit C.sizeof invocations and so on. I decided it was simpler to do it in C instead:

kstat_named_t *get_nth_named(kstat_t *ks, uint_t n) {
   kstat_named_t *knp;
   if (!ks || !ks->ks_data || n >= ks->ks_ndata)
       return NULL;
   knp = (kstat_named_t *)ks->ks_data;
   return knp + n;
}

Typecasts are another one of the irritations of cgo. Cgo makes every C type into a Go type, and boy does a lot of C turn out to have a lot of different integer types. In C they mostly convert into each other without explicit casts; in Go they are all fully separate types and you must explicitly cast them around in order to interact with each other and with native Go integer types. This can get especially annoying in things like for loop indexing, because if you write 'for i := 0; i < CFIELD; i ++' the compiler will object that i is a different type than CFIELD. This resulted in a for loop that looks like this:

for i := C.uint_t(0); i < k.ksp.ks_ndata; i++ {
   ....
}

(I wrote more about the mechanics in getting C-compatible structs in Go, copying memory into Go structs, and cgo's string functions explained.)

At the design level, my biggest problem was handling C memory lifetime issues correctly. Part of this was figuring out where the C library had to be using dynamic allocation (and when it got freed), and part of it was working out what it was safe for Go structures to hold references to and when those references might become invalid because of some call I made to the C library API. Working this out is vital because of the impact of coupling Go and C memory lifetimes together, plus these memory lifetime issues are likely to have an effect on your package API. What operations can callers do or not do after others? What precautions do you need to take inside your package to try to avoid dereferencing now-free C memory if callers get the lifetime rules wrong? What things can you not expose because there's no way to guard against 'use after free' errors? And so on.

(runtime.SetFinalizer() can help with this by letting you clean up C memory when Go memory is going away, but it's not a complete cure.)

Not all uses of cgo will run into memory lifetime problems. Many are probably self-contained, where all of your interaction with C code is inside one function and when it returns you're done and can free up everything.

GoCgoExperienceNotes written at 02:03:53; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.