Wandering Thoughts archives

2014-06-22

Things I like about Go

Writing a reasonable sized Go program has wound up giving me a bigger exposure to Go, reinforcing some of my existing views on it and giving me others. Today I want to talk about some nice things that I think Go does, things that make it easier to write code in it than I initially thought it might be. This is an incomplete list and I'm sure I'll find or think of other things in the future.

  • := type inference means that much of my code is free of any particular mention of actual types. Among other things this makes it much less of a pain to change the type of something since far fewer spots will need updating.

    The one nit I have is that I really wish there was some form of return value type inference. As it stands, every Go function needs to explicitly declare its return type, which can cause a bunch of heartburn if you change a type that gets returned up through a call stack.

  • Coming from Python, it's great that strings are generally a really lightweight thing that you can take subsets of any time you want to. My Go code makes new strings all the time and in Python all of them would involve significant memory copying. Byte arrays and array slices in general have similar properties.

    Strings have a bad side but the bad side is subtle and doesn't really apply until you're caring about memory usage (the same applies for slices of byte arrays).

    (If you look at Go from a C perspective instead of a Python one, strings are a massive productivity accelerator. Even if you already have a good buffered-strings library you like, simply dealing with it involves a lot of hassle that Go's built in good strings save you from.)

  • That the . operator can be applied equally to pointers and values, instead of needing C's . versus -> depending on whether you have an actual struct or a pointer to it. This especially helps with embedded structs and struct pointers, which Go encourages you to use. It also makes pointer-receiving method functions just a bit nicer.

    (. obscures this so much that I recently forgot that my code had a pointer instead of an actual struct and took the address of a pointer. Go's static typing promptly told me that I was trying to assign a **Rule to something that took a *Rule.)

  • The testing package, especially advanced aspects of it like test coverage. Again this is a cool thing from a C perspective more than a Python one, although Go goes out of its way to make it easy to test things.

  • gofmt, which doesn't just create a standard style but also saves me work. I can write a sloppily formatted list of constants or whatever and then just gofmt the file (well, the buffer, in GNU Emacs) and magically everything has been spaced out, lined up right, and so on with no effort on my part.

    (I could pick nits with some of gofmt's choices but it's not worth it. There's no point fighting Go's city hall here, which is of course sort of the point of gofmt.)

  • The fmt %v, %+v, and %#v verbs, which provide basic data structure introspection for debugging purposes. Even from a Python perspective this is nice; in Python you have to go out of your way to provide a useful debugging representation of a struct equivalent while Go just does it for you.

  • That the Go people are perfectly prepared to resort to brute force instead of (excessive) cleverness and thus that when you read the code in, eg, the standard packages you get inspired to do the same.

    As an example, I'll take how the standard Go templating system for both text/template and html/template does line numbers for errors. Rather than carefully tracking line numbers as it parses its way through a block of text, it simply exposes the absolute position in the block text. If it needs to know the line number of an absolute position, it just counts how many \n's there are between the start of the text and the position. This is a marvelously simple brute force approach to the problem and sure, it's inefficient, but generally you don't care about that in a lexer because the only time you want the line number is on a parse error and those are generally both rare and not performance critical.

    (I trivially extended the approach so that my parser error messages give both the line number and the character position within the line, something that would have taken me a lot of additional work if I'd tried to be clever and efficient.)

I like interfaces but I haven't done enough clever things with them yet to have much to say. They do let me do almost all of the duck typing that I want to do in practice.

The 'switch { case <condition>: .. case <condition2>: ... }' idiom is something that I find alternately nice and perhaps questionable code style. To put it one way, they offer many opportunities to be quite clever.

(This entry is a bit scattershot because I'm a bit flat today.)

Sidebar: My view on Go strings, Unicode, and UTF-8

On the one hand it's kind of annoying that Go doesn't really have a 'Unicode string' type, which would represent strings as arrays of runes so that 'ustr[N]' was always a Unicode codepoint instead of possibly part of a UTF-8 sequence the way it is today with Go strings. Python offers such a type in its Unicode strings and they have a number of nice properties.

On the other hand, such a type is almost inevitably a lie given the presence of combining characters in Unicode. My understanding is that even with normalization there are codepoint sequences that represent a single logical character, ie there are not precomposed versions of all possible decomposed character combinations. I may be wrong here, but at least the issue is a swamp (and at least requires you to normalize your Unicode text, which is not necessarily a trivial operation).

GoThingsILike written at 23:57:20; Add Comment

2014-06-19

Some notes on Go's godoc and what it formats how

I spent part of today banging my head against Go's godoc tool, so here are some notes about what it formats how. In general godoc is frustratingly under-documented and the available documentation is scattered around hither and yon.

As lots of things will tell you, you write documentation for an entire package by putting a comment block at the start of one file in the package. Like other documentation comments, this comment must end right before the package line. If you accidentally leave a blank line between the end of your documentation comment and the package line, godoc will silently ignore your comment. Speaking from personal experience, this can be very puzzling and frustrating if you don't realize what's going on.

(This is handy behavior if you want to add explanatory internal use comments in some of your source files; just put a blank line between the comments and the package line and godoc will skip them all.)

In its HTML rendition of your package or command documentation, godoc will turn some lines into headings (and in some versions of godoc you then get an index to them at the top of the page). As what documentation there is notes, this is:

[...] a span that consists of a single line, is followed by another paragraph span, begins with a capital letter, and contains no punctuation is formatted as a heading.

That a heading must be followed by another regular paragraph span means that you can't put a heading immediately before an indented section that is a preformatted block. For example, if you're describing command options you can't have an 'Options' heading immediately before a preformatted block covering all of the options. You need to shim in some sort of sentence or remark.

Godoc rendering plain text will indent your indented sections somewhat further than you did in the original comment, especially if you used /* ... */ to write a big section of what is basically plain text (this seems to be a common approach for large multi-line blocks of package or command documentation). Word-wrap your indented sections with generous right margins to compensate for this.

Godoc appears to sometimes do odd things if more than one file in a package has what godoc considers to be package level comments; when I did this by accident I got a confusing jumble of text that was not what I expected. So I think the rule here is 'don't even think about doing that' (which is what Godoc: documenting Go code says to do). Sadly it's easy (for me) to have accidents where you have a big block of internal explanation comments at the start of a file and then forget to leave a blank line between them and the package line.

I agree with the recommended approach of putting all your Godoc documentation in a separate doc.go file once it gets large enough. I'm using /* ... */ and basically treating it as a plain text file.

Sidebar: Additional trivia

Based on how Godoc behaves on the standard packages, it will ignore at least some copyright declaration blocks at the top of your file. The doc package variables suggest that this might be relatively generic but it's clearly undocumented so I'm not sure counting on this is wise. If I needed to have copyright notices I'd put them in a comment below the package line.

See for example the go vet doc.go file for an example of this.

GodocNotes written at 03:54:22; Add Comment

2014-06-13

Undoing an errant 'git commit --amend'

Suppose, not entirely hypothetically, that late some evening you commit a change and somewhat later that evening notice that you left something out of it. Sleepily confident that you haven't pushed your first commit you fix the omission with 'git commit --amend', start updating your copy of the repo at work only to have it throw you into a merge, and discover that not only did you propagate the first commit, you pushed it to the public Github repo. Oops. Now you need to reverse or undo the 'git commit --amend' (and discard the update that you pulled into your repo copy at work).

I don't know if the procedure I followed to fix my oops is the completely right one, but it worked and repaired things. Here is what I did:

  • Use 'git reflog' to look at recent changes in your repo (cf). If you literally committed and then amended, the top two commits are the relevant ones. For me, I saw:
    ae22c04 HEAD@{0}: commit (amend): ....
    9cae7d1 HEAD@{1}: commit: ...

    This is the amended commit and then the original one. We want to get back to the original commit (which is currently orphaned, cf).

  • Get your repo back to the original commit. For me:
    git reset "HEAD@{1}"

    Various Internet sources suggest using 'git reset --soft' instead. If I understand Git right, this will leave your amended version of the commit sitting in the git index ready to be immediately committed again. I personally would rather roll all the way back to the stage where I have to 'git add' things again, which is what the command here did.

  • Inspect the output of 'git diff', then make a real (non-amended) commit with 'git add' and 'git commit' as usual.

  • Push the new commit to Github et al.

(There is probably some way to get Github and your other repos to accept your new amended commit and forget the old one, but I don't know it offhand, I didn't feel like working it out at the time, and I shouldn't assume that no one has pulled a copy from my Github repo.)

To clean up my work repo, I aborted the in-flight merge with 'git merge --abort' and then re-pulled the master with 'git pull'. Presumably the work repo now has an orphaned copy of the amended commit just as my home repo does, but I don't really care. If I was really curious I could probably find it with some git command, especially as I know (part of) its hash.

(The work and home repos are exact mirrors of each other and I'm only making changes in one at any given point, in this case in the home repo. I have to say that git handily beats rsync for this, although I have to overcome my twitch about possibly committing incomplete work.)

PS: The right search terms to find this stuff on the Internet are apparently [undo git commit amend]. Searching for [reverse git commit amend] mostly got me discussions about how to use git commit -amend.

PPS: There may well be better ways to do this in Git (and if so, feel free to mention them in the comments). I'm not entirely hep to Git yet, partly because I want my repos to have a straight linear history if at all possible. Someday I may start doing rebases off private branches to get this, but not so far.

(This is the kind of entry that I write because someday I'm going to make this mistake again someday.)

UndoGitCommitAmend written at 00:59:02; Add Comment

2014-06-11

Some thoughts on testing parsers

I've been writing a parser lately. Since I am not completely crazy, this means that I've been writing tests for it too. This has naturally left me with some thoughts about testing one's parsers in ways that don't completely drive you up the wall.

(The disclaimer is that I've never tried to write a parser for a serious language. All of mine have been for relatively modest DSLs.)

First off, you really want to be able to serialize your parsed syntax tree to some canonical string representation, or at least something very close to this. In my biased opinion, building and comparing complex objects in tests is for the birds; serializing a complex tree to a string gives you a clean way of comparing parse results with what you expected. The serialization should obviously be unambiguous, even if this sprays it with more parentheses than anyone sane would normally use. The serialization format doesn't have to be complete (you might drop some extraneous information), but it helps if it comes close.

(Your test serialization format doesn't have to be your debugging dump format, of course. Sometimes languages push you towards doing this because they provide a single convenient way to stringify an object, but you can always resist and also write a nicer display format with nice tree indentation and so on.)

Once I have a serialization format, it's also proven useful to make it one that is valid input for the parser. Among other things, this lets you test what happens when you round-trip the serialization output through the parser again. Generally you want to wind up with the same ouput. In the past different output has been a sign that, for example, I was letting unnecessary AST nodes sneak into my parses.

(To detect this sort of stuff you want every AST node in the tree to leave its fingerprints somewhere in the serialization output.)

I suspect that this is only really feasible with relatively simple languages and parsers. Sophisticated parse tree transformations are probably going to make it too hard to go back to anything that really resembles language input.

Beyond that, it's useful to be able to directly invoke specific subsections of your parser, for example the bits dealing with expressions. Direct access makes it much easier to test these subsections; you can just write the appropriate input text without having to wrap it up in a complete and fully syntactically valid program, function, or whatever.

(Parsing less also makes for smaller parse trees and smaller serialized representations, which makes it easier to see things.)

Unfortunately this can wind up in conflict with the most natural way to code your parser. It can be easiest for the internal sub-pieces of the parser to assume a fair amount about the environment they're running in and want a lot of stuff set up already.

(I've been fortunate to write my parsers so far in languages where my tests could be extremely intimate with the insides of the parser, so they can at least in theory reach through all of this. The resulting tests are of course a bit fragile; if I change the internal details of how the bits of the parser interact, the tests are going to explode.)

PS: I suspect that people who write parsers more seriously have developed a bunch of techniques and tricks for good testing. As always, I don't feel quite motivated enough to spend a bunch of time trying to find and digest their stockpiles of information for what is a hobby project where I feel most motivated when I get stuff running instead of writing a lot of test cases.

ParserTestingThoughts written at 02:47:57; Add Comment

2014-06-09

A challenge in new languages: learning to design good APIs

Working on a new Go project has once again illustrated to me that one of the subtle and time-consuming parts about learning a new language is learning how design good APIs for your code. As traditional I've learned this by throwing together a number of APIs and then realizing that they're clumsy and awkward in various ways.

Every language has a collection of API idioms ranging from the small scale to the large (and often some more idioms on how to document your APIs), and with them another collection of anti-idioms, things to avoid. Some of these are obvious from the design of the language (eg when the language makes something terribly awkward), but generally many of them are not. One example is creating a C API that involves calling a function with a large struct as a parameter (instead of a pointer to it) or having a function return such a large struct. This is generally going to be seen as a bad idea due to the memory copying involved, but C doesn't make this explicit and indeed it sort of looks natural.

It's been my experience that learning these API idioms just takes time and writing code in the language (although 'how to design good X APIs' documentation helps, if you bother to read it). It's through writing code that I discover what's surprisingly awkward despite the initial appeal, what doesn't work, what works gracefully, and often why all sorts of existing code has the API that it does. One inevitable consequence of this is API churn in your own code as you discover that some of your early clever ideas weren't so clever after all.

(Writing tests for your code helps with this because tests use your APIs, both external and internal, and so can expose awkward ones. My experience is that this isn't perfect by any means; tests always feel a bit awkward to me anyways and not all of my APIs are used by tests.)

By the way, I recommend writing API documentation in some form as a way of smoking out awkward APIs. When I find myself putting cautions into the comments or struggling to explain something in a way that makes it sound sensible, I know I have a problem. This is not a cure-all since it basically just hits the generally obvious stuff.

(I also find that sometimes I know that something is awkward and probably wrong but I don't yet know enough to figure out how to make it better. And sometimes it turns out that there doesn't seem to be a better way.)

LearningAPIDesign written at 01:36:53; Add Comment

2014-06-07

Some ways to do sleazy duck typing in Go (from a Python perspective)

Normal duck typing in Go is straightforward; you define an interface type (if necessary) and then create some full implementations of it. I can think of a number of ways to enable code reuse in the style of inheritance, none of which I'm going to ramble on about because I've never done this. But sometimes in Python we do what I'll call sleazy duck typing, where we aren't actually a duck but we need to look enough like one to fool some other code. In Go, there are at least two ways to do this.

Let's call the first way incomplete fakery. In incomplete fakery you don't actually implement a working version of the interface; instead you do only as much as you need and then stub out the rest. Usually you hang this around a struct, often a struct embedding some other interface type that provides as much as possible of the functionality you actually need. The following is an example of faking a net.Conn:

type faker struct {
    io.ReadWriter
}

func (f faker) Close() error                     { return nil }
func (f faker) LocalAddr() net.Addr              { return nil }
func (f faker) SetDeadline(time.Time) error      { return nil }
func (f faker) SetReadDeadline(time.Time) error  { return nil }
func (f faker) SetWriteDeadline(time.Time) error { return nil }
func (f faker) RemoteAddr() net.Addr {
    a, _ := net.ResolveTCPAddr("tcp", "127.10.10.100:56789")
    return a
}

func TestSomething(t *testing.T) {
    var outbuf bytes.Buffer
    writer := bufio.NewWriter(&outbuf)
    reader := bufio.NewReader(strings.NewReader(clientInput))
    cxn := &faker{ReadWriter: bufio.NewReadWriter(reader, writer)}

    res := RequiresANetConn(cxn)
    writer.Flush()
    written := outbuf.String()

    ....
}

(This is adopted from the net/smtp tests from the standard Go packages.)

Because much of the implementation is non-functional, incomplete fakery is necessarily specific to the code that you're feeding the resulting duck to; you need to know what it needs and what it doesn't care about. As a result, incomplete fakery is probably most often used in tests (as is the case here, where it's simulating a network connection in a way that lets us easily gather the output written to the 'network').

The second way is what I'll call interposition. In interposition you wrap a real and functional implementation of the interface you need with code that alters its behavior in some way that you find useful. My wrong way to do Go logging entry shows an example of interposition where I put my own code on top of an io.Writer's Write() in order to mutate its behavior to add a prefix to everything written. Interposition takes full advantage of Go's ability to embed interface types in structs and to have the methods on those types transparently shine through the struct (except for anything that you deliberately preempt).

Because it's wrapping a real implementation, in theory interposition works with any code. However, if you've mutated the actual behavior of the real duck too much some code may explode; for example, if you wrote chunks of a line to my interposed io.Writer the resulting output would have the prefix sprayed all over it. Interposition used for sleazy purposes usually has some embedded assumptions about just how the resulting duck is really going to be used.

(Note that not all uses of interposition in Go are sleazy duck typing; it's a situational thing. Composing interfaces this way is one of Go's useful and powerful features and Go makes it much easier and less of a hack than it is in, eg, Python (since I started out by mentioning that language).)

There are probably other ways to do sleazy duck typing in Go. These are just the two that I've figured out and used so far.

GoSleazyDuckTyping written at 00:15:49; Add Comment

2014-06-03

My just-used Go logging idiom and why it is in fact wrong

I've just gotten through writing a Go package and command in which I rolled my own logging. I know, I should probably master the standard log package. But it seems to be a constant that in every new language I fiddle around with I wind up rolling my own logging for whatever reason. Fortunately this time I can not only tell you about my idiom but tell you why it's wrong.

As you might guess from my entry on how nil is sometimes not a nil, the fundamental model is to wrap a buffered io.Writer that points to the underlying logging resource (eg a file) with a interposed implementation that adds a prefix. This is:

type sLog struct {
   prefix []byte
   wr     *bufio.Writer
}

func (l *sLog) Write(b []byte) (n int, err error) {
   var buf []byte
   buf = append(buf, log.prefix...)
   buf = append(buf, b...)
   n, err = log.wr.Write(buf)
   if err == nil {
      err = log.wr.Flush()
   }
   return n, err
}

You then set up a sLog instance and pass it (well, a pointer to it) to something as an io.Writer. The thing doing the logging calls log.Write(...) exactly as if it was writing to a normal file or whatever, your code sticks your prefix on, and you're done. Extensions to things like timestamps are left as an exercise.

This is a superficially appealing approach; what could be more logical than a pipeline-based approach where we wrap an underlying Writer in something that sticks our additional information on. It feels perfectly Go-ish, with a clever use of interfaces and creating our own implementation of one. And the use of a buffered writer should help avoid the ever-popular issue of partial lines stomping over each other from multiple sources, as we flush only full lines out.

(This code is safer than it looks because wr is not shared between different sLog instances; only the underlying io.Writer behind it is. This is tricky and should be documented. A shared wr would leave us counting on any guarantees bufio makes about multiple concurrent Write()s to a shared bufio not interleaving their output.)

The big problem here is that the semantics are not so much wrong as misleading. My code here implicitly assumes that Write() is only called to write whole lines. This is how you want to (and must) write unbuffered log lines to avoid output interleaving, but Write() in general has more broad semantics than 'write whole lines'. A more specifically restricted interface, such as some of the ones exposed in the standard log package, would make it clear that you should only be writing whole lines to the log instead of feeding it random chunks of partial output.

(I know enough to not log partial lines, but this issue is at least tricky. An interface that only accepts a full line of output makes it obvious what you have to do to use it. And interfaces should be created to say what they mean.)

PS: even this code is not absolutely safe against concurrent writes; it's assuming that the underlying io.Writer that bufio will call when we flush the output is itself basically atomic. For absolute safety we would have to add another layer that serializes writes to that io.Writer. Today's moral is that safe logging in the face of concurrency is a pain in the rear that is best left to other people's code (provided that you can trust it).

GoLoggingWrongIdiom written at 03:20:37; Add Comment

By day for June 2014: 3 7 9 11 13 19 22; before June; after June.

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.