2010-10-27
An extra problem with not documenting things in open source modules
Programmers are famous for not doing a great job of documenting the interfaces of their projects (whether these are libraries with APIs, protocols, programming languages, or whatever). The problems that this causes for people who are trying to use your work are well known, but it's recently struck me that open source projects in particular have an additional issue.
In the absence of documentation, people working with open source projects can always read the source to find out how to do something, to discover the interface that the project written down yet. But when you do so, you're left with a question: is what you've found something that you should actually be using?
There's at least three possibilities. First, you could have found the right answer, the correct interface to use for what you want to do, and it just hasn't been documented yet. Second, you could have found an unsupported and unstable internal interface or an implementation detail. Third, you could have found an experimental interface that the project authors haven't finalized yet (and thus haven't documented).
(I'm going to assume here that the name of the interface and other details in the code give an outsider no obvious clues. Remember that the people searching through the guts of your library probably don't have deep experience with the code base and your conventions.)
The quality of documentation matters here. The more clearly patchy and incomplete the documentation, the less guidance it can give you about which possibility you've hit; conversely, the more complete the documentation the more you can infer something from things not being documented. How the documentation is created also matters; people can infer one thing from entirely hand-written documentation and another from documentation that is at least partly automatically extracted from source code.
Of course, an explicit mention of this issue in your documentation helps to some degree, but make sure that it's believable and useful. Saying 'anything not mentioned here is internal and unsupported' is pretty pointless if your documentation is clearly incomplete, because you give people the choice of either ignoring your warning or ignoring your project entirely. (You lose either way.)
(I don't think that this issue is completely avoided in systems where you have to explicitly mark things to be publicly visible, just reduced somewhat. Of course you could always modify your build system so that basic interface documentation is automatically extracted and only things with such documentation are marked as publicly visible.)
2010-10-24
Why I'm interested in Go
These days, my substantial programming takes place in one of two languages. I use Python if I'm dealing with something that doesn't have to run fast or use minimal memory, and I use C on the occasions when Python doesn't fit. I like C, but it is a very sparse and unforgiving language when compared to Python and this translates to slower and more annoying development most of the time; there's a lot of low level details that I have to worry about when I just want to bang out some code that runs fast(er).
(This annoyance leads me to use Python for everything that it can even barely be made to fit.)
There are times when C is the right answer, but there are also a lot of times when what I want is in the middle between C and Python, problems where Python is too heavyweight and bare C is too low level. Part of why I'm interested in Go is because it seems to be the most promising candidate to fit in this niche on Unix. It promises relatively fast runtime and relatively little memory usage while still giving me garbage collection, convenient strings, hashes, and arrays, and a decent set of support modules. For me, this makes Go the attractive choice for writing various system level programs like non-trivial network daemons.
(There will be C libraries for all of the packages that Go comes with, but I'd have to go find them and that leads to the selection problem. And in general, syntax matters and Go has better syntax than C plus libraries.)
Another part of why I'm interested in Go is that it comes from a group of people (call them the Plan 9 crowd) that have created a whole bunch of interesting, good ideas that I've found attractive in the past. Sometimes the results are too purist for my tastes, but they've pretty much always been worth a look. And the way I look at languages is actually using them for real work, so I have gone and dabbled in Go; ideally I would like to use Go to write a relatively substantial program that I'll actually use.
(It has to be a personal program, since I won't write production programs in obscure languages that my co-workers have never heard of and that don't (yet) come packaged on our Unix systems.)
Sidebar: why other candidates are out
Java is, right now, not a particularly great language to write Unix system programs in for at least two reasons. First, Java programs generally start slowly for the same reason that Python programs start slowly; they have to load and start the interpreter before they actually start the program. Second, my impression is that the JVM does not provide good access to Unix systems facilities.
(These drawbacks apply to any number of interesting languages built on top of the JVM, and in general to any interpreter-based language. Slow startup is not an issue for long-running programs, but I don't write that many of them.)
The D language struck me as sort of interesting when I first heard about it but it doesn't seem to have cohered into a useful system (rather the reverse, in fact), and I'm not really interested in languages that aren't open source because there's very little chance that they will become popular on my platform of choice.
C++ has many of C's problems as far as language features go, just somewhat nicer syntax once I find libraries and packages and so on that do what I want. And it doesn't have native garbage collection, which is one of the great programming speed accelerators.
2010-10-23
My issues with Go's net package
I would like to like the Go language, but right now I can't get past my annoyance at its net package. I have several network-related things that I would like to do with Go, and the net package's current state is getting in the way of all of them.
The fundamental problem is that Go's net package is both incomplete and what I'll call sealed. The easiest way for me to explain this is to contrast it with Python's socket support.
Both Go and Python represent network connections with language-level objects that have an abstract interface; underlying both of them are OS level file descriptors. Go and Python both have operations to create new network connections and so on, which return their high-level objects. The difference between Go and Python is that in Python you can both get at the OS level file descriptor behind a network connection and create a new network connection object from an OS level file descriptor, and in Go you cannot.
There are two limitations this imposes. First, you cannot create network connections from file descriptors that you obtain or inherit and in turn this means that you cannot write an inetd-started network daemon in Go (at least not without reimplementing a chunk of the net package yourself). As it happens, some version of this is one of the things that I would like to do in Go.
Second, this means that you can't do things that need the file
descriptor itself. On the one hand this is sort of fair (the network
connection object doesn't want the state of the socket changed behind
its back), but on the other hand this is where the incompleteness of the
net package comes in, because it doesn't have anywhere near all of the
functionality that you'd want. The instance that is directly annoying
me right now is that the net package does not implement shutdown()
on network connection objects, which I need for one of my standard
learning exercises.
One of the reasons that this makes me somewhat unhappy with Go the
language is that it argues for two things at once: that Go is an
opinionated language environment and that the opinions of its creators
are not mine. For example, one of the things that you can't do without
access to the file descriptors is write a multiple connection select()
or poll() based server. But I suspect that if you raised this argument
with the Go people, they would tell you that you should use goroutines
and channels to implement this.
(I am far from convinced that goroutines and channels are the right
answer for some of the things you can do easily in a poll() based
system, but that's another entry.)
PS: I am deliberately simplifying the Go situation when I talk about 'network connection objects'. What you deal with is actually a Go interface that is implemented by a number of different concrete types, one for each of the different major network connection types (like 'tcp' and 'udp').
2010-10-03
An API mistake Unix has made several times
Unix has generally had decent APIs, but every so often Unix people have been a bit too concerned with minimalism and storing data as efficiently as possible. Several times this has invited in string-related APIs that practically beg for code to make mistakes, because of how they specify string termination.
Here is an abstracted example. Consider an API where you use the following structure:
struct dirent {
uint16_t ino;
char name[14];
}
You will notice that there is nothing explicit to tell you how long
the name is. Instead, the API has a rule: the string in name is
null-terminated unless it is exactly 14 characters long. This
maximizes the length of the name that you can store in the structure,
at the cost of complicating code that has to get the name out.
You can guess what happens next. Many people who write code that has to
deal with this structure simply use strcpy() to copy the name to their
own string, instead of the more complicated version that also deals with
the case of a 14-character, non null-terminated name. The resulting
programs work most of the time, because most of the time the name is
shorter than 14 characters, but they blow up oddly every so often in
what appears (to their users) to be unpredictable patterns. Over time,
semi-superstition evolves to the effect that '14-character names are
bad, avoid them'.
(This is of course yet another example of having to be sure that something actually is a C string, as well as the fact that exceptions are hard for people to remember.)
I blame this partly on minimalism because one of the ways to deal with this would have been to make some accessor functions and tell people to always use them. Instead, the structures were simply exported to people directly and every programmer using them had to get the whole access dance correct. This has the minimalism of avoiding an 'unnecessary' and obvious function in the standard library, at the cost of having people get it wrong with reasonable frequency.
(Off the top of my head, I believe this mistake was made in at least the original V7 directory format and in some versions of utmp records.)
My meta-moral for this is make things in your API that look like C strings actually be C strings. If people can treat them as C strings and have this work most of the time, a significant number of people will treat them as C strings regardless of what you say in your documentation. The corollary is that if you have things that are not C strings, you should consider actively frustrating attempts to use them as such by means like never null-terminating them. If you don't want to do this, make accessor functions that do it right and don't expose the raw structures.