2013-02-21
Go: using type assertions to safely reach through interface types
To start with, suppose that you have a Go
net.Conn value, call it conn, that you want to shutdown() (for
writing) on if possible. Some but not all specific concrete net
connection types make this available as a .CloseWrite() method (eg
it's available for TCP sockets but not for UDP ones), but net.Conn is
an interface type and it doesn't include a .CloseWrite() method so
you can't directly call conn.CloseWrite().
(In Go's software engineering view of the world this is a sensible
choice. net.Conn is the set of interfaces that all connections can
support. If you included .CloseWrite() in the interface anyways you
would force some connections, eg UDP sockets, to implement a do-nothing
or always-error version of the method and then people would write Go
code that blindly called .CloseWrite() and expected it to always
work.)
So sometimes conn will be of a concrete type that supports this (and
sometimes it won't be). You want to somehow call .CloseWrite() if it's
supported by your particular value (well, the particular concrete type
of your particular value). In Python we would do this either with a
hasattr() check or just by calling obj.CloseWrite() and catching
AttributeError, but we're in Go and Go does things differently.
If you're a certain sort of beginning Go programmer coming from Python, you grind your teeth in
irritation, look up just what concrete types support .CloseWrite(),
and write the following brute force code using a type switch:
func shutdownWrite(conn net.Conn) {
switch i := conn.(type) {
case *net.TCPConn:
i.CloseWrite()
case *net.UnixConn:
i.CloseWrite()
}
}
(Then this code doesn't compile under Go 1.0 because net.UnixConn
doesn't implement .CloseWrite() in Go 1.0.)
What this code is doing in its brute force way is changing the type of
conn into something where we know that we can call .CloseWrite()
and where the Go compiler will let us do so. The compiler won't let
us directly call conn.CloseWrite() because .CloseWrite() is not
part of the net.Conn interface, but it will let us call, say,
net.TCPConn.CloseWrite(), because it is part of net.TCPConn's public
methods. So if conn is actually a net.TCPConn value (well, a pointer
to it) we can convert its type through this type switch and then make
the call. Unfortunately this code has the great drawback that it has to
specifically know which concrete types that sit behind net.Conn do and
don't implement .CloseWrite(). This is bad for various reasons.
(I am mangling some Go details here in the interests of nominal clarity.)
The experienced Go programmers in the audience are shaking their heads
sadly right now, because there is a more general and typesafe way to do
this. We just need to say what we actually mean. First we need a type
that will let us call .CloseWrite(); this has to be an interface type
because we need to convert conn to it (somehow).
type Closer interface {
CloseWrite() error
}
(It's important to get the argument and return types exactly right even if you're going to ignore the return value.)
Now we need to coerce conn to having that type if and only if this is
possible; if we blindly coerce conn to this type (in one of a number
of ways) we will get a runtime error when we're handed a net.Conn
with a concrete type that lacks a .CloseWriter() method. In Go, this
safe coercion is done with the two-result form of a type assertion:
func shutdownWrite(conn net.Conn) {
v, ok := conn.(Closer)
if ok {
v.CloseWrite()
}
}
(We can't just call conn.CloseWrite() after the coercion because we
haven't changed the type of conn itself, we've just manufactured
another variable, v, that has the right type.)
This is both typesafe and general. Any conn value of a concrete
type that implements .CloseWrite() will work and it will work
transparently, while if conn is of a concrete type that doesn't
implement .CloseWrite() there are no runtime panics; all of this is
exactly what we want. The same technique can be used in exactly the same
way to reach through any interface type to get access to any (public)
methods on the underlying concrete types; set up an interface type with
the methods you want, try coercing, and then call things appropriately.
(I actually like this typesafe conversion and method access better than the Python equivalent because it feels less hacky and more a direct expression of what I want.)
I think that it follows that any type switch code of the first form, one where you just call the same routine (or a few routines) on the new types, is a danger sign of doing things the wrong way. You probably want to use interface type conversion instead.
(Had I read the right bit of Effective Go carefully I might have seen this right away, but Effective Go doesn't quite address this directly. All of this is probably obvious to experienced Go programmers.)
Update: there are several good ideas and improvements (and things I didn't know or realize) in the the golang reddit comments on this entry.
Some notes on my first experience with Go
I've finally wound up writing my first Go program. The program is a Go version of what seems to have turned into my standard language test program, namely a netcat-like program that takes standard input, sends it off to somewhere over the network, and writes to standard out what it gets back from the network. Partly because Go made it easy and partly due to an excess of new thing enthusiasm the program grew far beyond my initial basic specifications.
(I'm somewhat bemused but a netcat-like program really has become a standard program I write in new languages and to try out things like new buffering libraries. It's actually not a bad test.)
On the whole the experience was quite pleasant. The specific need I
had is something I normally would have handled with a Python program
and writing my Go program was not particularly much more work and
bookkeeping than the Python equivalent would have been (it took much
longer to write because I was semi-learning Go as I went and I already
know Python). The code has reasonably few variable declarations and most
of them are non-annoying; Go's := idiom really helps with this since
it means that in many circumstances you don't have to declare a variable
or specifically name its type.
One important thing I wish I'd know at the start is that you should ignore most everything the Go documentation overview pages tells you about what to read. Effective Go is in practice the quick guide to Go for C programmers, or at least for C programmers who have some general idea about Go to start with, and is the closest thing Go has to Python's excellent tutorial. The language reference is overly detailed and too hard to read for learning and the interactivity of the beginning tutorial makes it completely unsuitable for quick starts.
One of the reasons that I got as far as I did as fast as I did is that
Go's networking library has a relatively high-level view of the world.
There is no Python equivalent of Go's net.Dial() or net.Listen()
APIs, at least not in the standard library; the existence of both
of them made handling an absurdly wide variety of network protocols
basically trivial (along with a bunch of complexity of hostname and port
number lookups). On the flipside this API is not complete (especially
in Go 1.0) and has a number of really annoying omissions. This is
especially frustrating since I have the (Go) source for the net
package and can see perfectly well that what I want access to already
exists in the package; it's just not exported and (unlike Python) you
can't fish into a package to grab stuff yourself.
My code wound up using goroutines and channels, although in a relatively basic way. Designing program flow in terms of channels definitely took several attempts before I had everything sorted out cleanly; earlier versions of the code had all sorts of oddities before I sorted out exactly what I wanted and how to express that in channel data flows. My broad takeaway from this experience is that it's very important to think carefully about what you want to do before you start eagerly designing a complex network of channels and goroutines. It was easy for me to get distracted by the latter and miss an obvious, relatively simple solution that was under my nose.
My feelings about channels and goroutines are mixed. On the one hand I think that using them simplified the logic of my code (and made it much easier to support TLS), even if it took a while to sort out that logic. On the other hand having to use goroutines is responsible for a serious wart in one aspect of the program, a wart I see no way around; the wart arises because there's no way for outside code to force a goroutine blocked in IO to gracefully abort that IO (this is a fundamental issue with channels).
This is rambling long enough as it is, so I think that I will save my language disagreements for another day. Well, except to say that I think that the standard Go package for parsing arguments and argument flags handles command line options utterly the wrong way and I need to get a real argument parsing package before I write another Go command.
(Go's standard flag package apparently follows some argument parsing
standard that Google likes. It is pretty non-Unixy while looking just
enough like normal Unix argument handling to fool you.)
2013-02-19
The source of C's dependency hell for linking
C famously has a dependency hell problem for linking (both static and dynamic, although the static linking one is often more tractable). This is the problem both of what libraries you need (including what libraries are needed by the libraries that you need) and in what order you need them; it often results in people cramming ever-increasing numbers of libraries into their compiler command lines in the hopes that one of those libraries satisfies things.
As I alluded to in a comment on this entry, the root of that dependency hell is that C has only a single global namespace. With a single global namespace there is no explicit 'import' operation; global names can come from anywhere and appear from everywhere (in fact this is abused as a feature, where you can override or preempt a library routine). One way to put it is that in C, all global names from outside the current file are late-binding and scopeless. They can only be fully resolved or declared invalid at link time when the final binary is built. This leads naturally to libraries that themselves depend on and use global names which come from, well, somewhere, no one knows exactly where until link time.
(Global names often must be declared but this declaration is itself without scope or origin. There are many unfortunate things that result from this, including the potential mismatches between declarations and actual reality.)
This is in stark contrast to a compiled language with a package system and explicit imports (such as Go). In those languages, names are always within the scope of a package and a competently implemented compiler environment reliably knows the dependencies (both direct and transitive) of a piece of code; it knows what packages the code has imported and used names from, and it knows what packages those packages need, and so on. It may not be able to find them on the filesystem, but it can at least tell you that this code needs the compiled forms of the following N packages. It can even throw in version numbers (or something more comprehensive) if it wants to.
My memory is that Plan 9 made some attempts to change this for C. If I remember right, Plan 9 basically moved to a model where there was one header file per library and each header file contained a pragma to tell the compiler what the library was. Of course this is not ANSI-compatible in the least but I don't think the Plan 9 people considered this much of a problem.
In theory the library dependency problem can be dealt with; at the time you build a library (static or dynamic) you can 'link' everything as far as resolving all of the global names that the library needs, then note down where they all came from. In practice traditional Unix static libraries have never had this information and aren't built in ways that creates it (a traditional static library is just an archive of object files). I think that some dynamic library formats have attempted to include this sort of dependency information where available as a hint to various parties.
(And of course a C compiler environment could add support for a Plan 9 like pragma to say 'the stuff from this header file comes from this library' and then embed the resulting hint in the generated object files and so on. But I don't think anyone has. My cynical side suspects that it's just not considered an important problem.)
2013-02-10
A little irritating (but understandable) limitation on Go interfaces
I was all set to write an entry about how you could use Go-style interfaces to create what I called
easy, type-safe conversions from a variety of types to something you
wanted. The sketch of the problem and the idea is that suppose you
want to create an API that accepts arguments in multiple forms (eg), for example something in uncompiled string
form or in a compiled efficient form. The usual way to implement this is
with a type switch in your functions ('if argument is a string, convert
it to ...') but this annoying and limited.
Interfaces to the rescue, in theory: define a 'Converter' interface
with a single 'ToMyThing()' method, make your API take Converter
arguments (and your functions then call arg.ToMyThing()), and
define a new ToMyThing() method on strings, integers, and whatever
else you want to accept.
(The ToMyThing() method for your actual type does nothing and just
returns itself.)
People who know Go are shaking their head sadly right about now. Here, let me tell you why:
prog.go:9: cannot define new methods on non-local type string
Well, oops. So much for that.
If you understand how Go's types and interfaces are implemented, this
makes sense. One of the parts of the type description for every concrete
Go type is a static, fixed array of methods (with various information
including their name and a function pointer); this is built as the type
is compiled. What this error message really means is 'the method array
for string has already been built, you can't add entries to it now'.
It's not hard to see why allowing this would massively complicate Go's life. The big reason is that entries in the method array are sorted by name (for good reason). Adding a name to it after the type has been compiled means re-sorting the array, changing the index position of entries, and then finding and changing all references to now-invalid index positions in already compiled code. In practice you would want to defer the index position resolution until link time (as a form of link time relocation) and I'm not sure that's even possible in object formats like ELF. Certainly it would add a lot more complexity to the whole process.
(Actually it's even worse; you would have to defer building the method
table entirely until link time, since you might have to merge together
string method definitions from all over your code base.)
You would also open up the possibility of weird link-time errors.
For example, suppose that two separate bits of Go code both
independently decide to add (different) Convert() methods to
string. This pretty much has to be an error, but it's something
you can only detect and report at link time. Worse, those bits of
Go code work independently; they only fail when you combine them
into a single program. This is not a recipe for good software
engineering and I'm
not at all surprised that Go left this out.