Wandering Thoughts: Recent Entries

2013-05-15

Why I've so far been neglecting functional programming languages

Functional programming languages are in many ways the latest hotness and so for years I've been making off and on runs at things like yet another explanation of monads (which I think I sort of understand by now) and similar topics. Despite this, so far I've been almost completely uninterested in actually trying to write a functional program or exploring a FP language.

The big problem for me is that as far as I can tell, the kind of programs I usually work with are exactly the kind of programs that functional programming is stereotypically a bad fit with. The stereotype I've absorbed is that functional programming is quite a good fit for computation but not a good fit for IO, because IO intrinsically has side effects. Unfortunately most of what I write is all about IO and has little or no computation. Bashing a squarish peg into a roundish hole is unlikely to tell me anything particularly meaningful about nice the language is to work in; what I really need is a roundish peg, a computational problem, and those are relatively scarce around here.

(It's possible that I'm not looking hard enough. For example, I do periodically want to do things like log analysis or event reassembly, where the original data could just as well be a predefined data structure in the program instead of processed from logfiles on disk. I suspect that a functional language would handle these fine, maybe better than ad-hoc hackery in awk, Python, or whatever. If I was really crazy I would try rewriting the logic in our ZFS spares handling system in an FP language to see if it got clearer; it's fundamentally a series of transformations of a tree and then some analysis of the result. The result might even be more testable.)

WhyNotFunctional written at 00:56:36; Add Comment

2013-05-13

My language irritations with Go (so far) and why I'm wrong about them

The great thing about an evolving language is that if you're slow enough about writing up your irritations with it, some of them can wind up fixed (or part fixed). So this list is somewhat shorter than it was when I originally wrote my first Go program, and none of the irritations are major. Also, I will reluctantly concede that Go has good engineering reasons for all of them.

My largest single irritation is that break acts on switch and select; I expected it to act only on any enclosing control structure, so that you could write something like:

for {
   select {
   case <-mchan:
      // message silently swallowed
   case <-schan:
      break
}     

Instead you have to invent a boolean loop condition. I understand why Go does this; it enables you to exit early out of a switch or select case instead of having to wrap everything in ever increasing levels of nesting. This is likely especially important because Go uses explicit error checking (which would otherwise force those nested if blocks).

The issue that got partially fixed is Go's return requirements. When I wrote the original version of my program the natural form of one function was a big switch with a number of specific cases and then a default: to catch the rest; however, the original rules required a surplus return at the end of the function, which irritated me by forcing me to move the default case to the end of the function, obscuring the logic. The Go 1.1 changes make my particular case okay but I believe there remain cases where you need an unreachable ending return (or panic) to make the compiler happy.

You can make an argument that the original and current state of affairs are good software engineering. If the compiler did true reachability analysis it'd increase the number of cases where an innocent looking change to some part of the code would suddenly make the return coverage not be complete and thus produce potentially odd messages about missing returns. The current brute force rules protect against this and lead Go programmers to write in a certain sort of consistent style.

My final issue is my perennial one of being unable to cleanly cancel IO being done by goroutines, breaking them out of things so that they can see a death signal from outside. You can argue that this is a bug in the runtime, but the problem with this is that everything that calls an IO operation then needs to be aware of this particular error case (and catch it, and propagate it up the call stack in whatever way is appropriate). A good start to making it a bug in the runtime would be for the runtime to define a specific error for 'IO attempted on closed connection' and for absolutely everything to use it.

(As it stands, the net package doesn't even define a publicly visible error instance for this case, although it does define one internally. It's my personal view that this beautifully illustrates why this is a general language problem; while you can 'solve' it in code, it requires absolutely everyone to get it right and, well, they clearly don't.)

Again this is a software engineering tradeoff. Both the semantics and the runtime implementation of goroutines are undoubtedly vastly simplified because you don't have to worry about being able to signal or cancel a goroutine from outside itself. Outside of the program exiting, all of the interaction that a goroutine has with the outside world are initiated by itself, on its own terms. This makes it much easier to reason about the effects of a goroutine, especially if it's careful not to use global state.

GoLanguageIrritations written at 23:39:13; Add Comment

2013-04-15

Go's friction points for me (and a comparison to Python)

A commentator on my entry on Python's data structures problem asked in part:

So, what's next, if anything? I take it Go wasn't a revolution in the way the migration from C to Python was. [...]

This brings up the complex issue of my views on Go. Part of the issue is that Go has a bunch of friction points right now. Some of them are intrinsic in the language and some of them are simply artifacts of the current situation and will hopefully change.

(I wrote more about where I think Go fits into my programming back in GoInterest.)

In general I don't think that Go will ever be as fast to program in as Python is (in the sense of how long it takes me to write a program, not in how fast it runs). Go goes to a lot of work to reduce the amount of bureaucracy involved through various features, but Python is simply at a higher level in terms of eliminating make-work and as a result it's significantly more flexible and adaptable. The tradeoffs involved are sensible for both languages and their goals; as discussed Go has a strong emphasis on large scale software engineering and Python doesn't.

(To put it one way, Go is a great language for large scale software projects but I almost never write those. As a sysadmin I'm generally a small scale programmer.)

I'm going to split this into current and intrinsic friction points, then do this in point form to keep the size of this entry from exploding. First, the current friction points:

  • Go is not pervasively available in the way that things like Python, Perl, and awk are. This is especially true of current versions of the native Go toolchain, which is really what you want to be working with. This is merely a pain for personal development (I can always build the toolchain myself) but a relative killer for work programming in our environment.

    (To put it one way, 'first you download and build the compiler' does not make Go sound attractive to my co-workers.)

  • Go's standard library is limited and portions of it are crazy. This can be (somewhat) fixed with external packages but then I have to find them and evaluate them and so on, which is a hassle. It would be less of a hassle if people started making OS packages for various good add on Go packages, the way many Perl and Python add-on modules are only an apt-get or yum command away on most Linuxes.

    (Part of why this matters to me is that $GOPATH makes me grind my teeth. It strikes me as such a bad fit for working with multiple projects under version control that it's painful.)

  • The state of web frameworks for Go seems unclear right now. I especially care about form handling and validation, especially for database-backed forms (because this aspect is generally the largest pain in the rear to code by hand; it's what drove me to Django for my Python web app).

  • Debugging is less friendly with Go than with Python, because if you screw up in Python it will dump out a great big verbose stack backtrace; often this points me to exactly the mistake I made. Go is a lot terser and thus less helpful.

(There are also pragmatic issues with using Go in production.)

I thought that I had several intrinsic language issues but at this point all I can think of is the general extra annoyance of explicit error handling as opposed to Python's tacit exceptions. I understand why Go makes the choice it does but Python's exception-based approach is just plain convenient for quick coding and it means that you can write much less code (you can aggregate error checks and even skip writing explicit ones and your program will still abort on errors).

(I consider things like Go type assertions to be part of the general price paid for static typing. I can't really describe static typing as a friction point, although to be honest it sort of is.)

Also, as I've written before I maintain that Go's obsessive focus on goroutines with basically no support for select() et al is ultimately a mistake. Goroutines cannot do everything and there are real situations that they don't cope with (not unless you allow them to be canceled from outside while they are in nominally blocking routines).

(If I use Go more I may find some additional irritations. Python is a relatively featureful language as compared to Go, so I may find myself missing things like function decorators at some point.)

GoFrictionPoints written at 02:32:36; Add Comment

2013-04-03

How to make sysadmins unhappy with your project's downloads

I tweeted:

Dear every project that doesn't have an URL for their tarballs that is easily wget'able: ha ha. Very funny. Please stop. #sysadmin

Let me expand on this a bit. First, I'll give a pass to everyone who has access-restricted downloads; there is no good way to make them easily fetched. This is for everyone else, all of the various projects that have public downloads.

Here is the thing: sysadmins are not necessarily browsing your website on the machine where they actually want the source code. In fact it's almost certain that they aren't, since very few sysadmins run Firefox or Chrome on their servers. What sysadmins want to do is use 'Copy Link Location' on the (nominal) URL of your project's distribution tarball, open a connection to the server, type 'wget <pasted URL>' on it, and wind up with a sensibly named tarball (or zip file or whatever) of your source afterwards.

There are at least two ways that this goes wrong. Sadly I am going to have to pick on the Django web framework for the first one, because it inspired my tweet. The download URL for Django 1.5.1 is:

https://www.djangoproject.com/download/1.5.1/tarball/

If you feed this URL to wget, you do not get something called 'Django-1.5.1.tar.gz' but instead a file called 'index.html' (which is the gzip'd tarball that you want, just with the wrong name). This is because wget operates in a very straightforward way; it puts whatever it fetches in a file named after the last component (or index.html if the last component looked like a directory, as it did here). Wget does have an option to change this, --content-disposition, but I had to look it up in the manpage. Sysadmins do not appreciate being forced to look up (and then type) long options to wget to get your tarballs.

The fix for this is straightforward: your download URL should have a last component that is the name of the distribution tarball or applicable file. Then wget will do the right thing.

(Github does a variant of this. The stated URLs of a zipfile of a repo are things like <user>/<project>/archive/master.zip, but the fetched file is supposed to be called <project>-master.zip. Browsers that pay attention to the HTTP Content-Disposition header will save it under that file name; wget will at least use master.zip.)

The other really bad thing you can do is what Sourceforge at least used to do. The nominal 'download' links on Sourceforge projects didn't go directly to the files (despite appearing as if they did); instead they went to an interstitial HTML page that told you about mirrors and and automatically started a download (I assume through the use of a HTML '<meta http-equiv="refresh" ...>' in the page). This is of course completely impossible to feed to wget, which doesn't interpret this HTML <meta> tag at all. You should not do this sort of trickery; your download links should actually be links to the files, not to any sort of interstitial experience. If you need to make people go through a mirror, do that with an HTTP redirect and put an explanation about it on your download page.

WgetableDownloads written at 00:28:48; Add Comment

2013-03-25

My (current) view of using branches in VCSes

In a comment on this entry, Aristotle Pagaltzis asked:

(Though I admit I wonder why you do have fred-1 and fred-2 [source directories] rather than branches in your VCS.)

The simple answer is that my favorite way of changing branches is with cd. This is especially the case if I'm developing things in parallel and may well wind up throwing one of them away; for good reasons VCSes make totally deleting a branch much harder than a plain 'rm -rf'. I'll admit that part of my preference is because I haven't yet gotten around to master branching in either git or Mercurial (partly because I frankly don't entirely like it or need it yet).

(Some people will say that you should keep even experimental stuff that didn't work out in your VCS in case you ever want to go back to it later. This may work for them but it doesn't work for me; I want my VCS to be neater than that.)

But even without that I think cd is much easier than going back and forth between branches in the same directory hierarchy, especially if you're developing on both branches. With today's VCSes, flipping back and forth between branches generally wants you to have actually committed your work to both of them and this itself leads to messy history (or to a lot of redoing commits or the equivalent, as you commit only so that you can flip branches, then annul and overwrite the commit the next time). I also personally think that it is a cleaner and more natural model for (multi-)branch development, with the only downside being more disk space being used.

(And disk space is generally cheap unless you're dealing with huge repos. The two largest repos I have handy are Mozilla and the Linux kernel; Mozilla is 2 GB and Linux is 1.2 GB. That's not going to break the bank on modern machines.)

I understand why VCSes have branch-switching commands (they can't not have them, to put it one way) and the benefits of having multiple branches in the same repo (including things like being able to do easy diffs between branches). But it just doesn't fit into the way that I prefer to interact with VCSes and I like to keep my life simple.

MyVCSBranchingView written at 01:20:58; Add Comment

2013-03-18

The wrong way for a framework to lay out projects

I tweeted:

I admire Django's attempts to screw up my entire source repository structure, where by 'admire' I actually mean 'am madly hacking around'.

I'm sad to say that Django 1.4 is a great illustration of two bad things. First it is a great illustration of how not to lay out a project hierarchy and second it is a great illustration of how not to do a layout transition.

Up until Django 1.4, the more or less canonical source layout of a Django project looked something like this:

mysite/
    manage.py
    settings.py
    ...
    myapp/
        models.py
        ....

The manage.py file is basically the central hub for doing a lot of things with (and to) your project. In Django 1.4 they decided that manage.py should live at the top level, in a layout that now looks like this:

manage.py
mysite/
    settings.py
    ....

The problem with this layout is that it breaks the first rule of sane code layout: everything goes in a single directory hierarchy that is your VCS repo. Modern VCSes manage a directory hierarchy, so you really want to give them one. The Django 1.4 layout requires you to invent a container directory purely so that you can put manage.py in the same repo as mysite. This container repo has no sensible name the way mysite did (or alternately the only sensible name is once again 'mysite', so you have a mysite/mysite directory to confuse everyone). By contrast the old layout was perfect for VCSes (you made mysite the root of your VCS repo and everything was great).

The other problem is transitioning an existing project that has, of course, set itself up with the mysite directory being the VCS repo. In order to keep manage.py under VCS and to keep Django happy, you get to push absolutely everything else in your repo down a level in a massive (and completely artificial) rename changeset. Depending on how your VCS works this may well completely screw up VCS history and the ability to trace changes back over the discontinuity. Since I decline to do this to myself (and to our Django-based web app), I'm instead forced into very ugly hacks in manage.py to make it work where it is.

(Django people will say that Django is forced to do this because of Python module handling issues. My view is that it is a mistake to make things into modules in the first place when they are in fact not, and mysite is not a module in any meaningful sense.)

BadProjectLayout written at 21:07:40; Add Comment

2013-03-03

Why a netcat-like program is a good test of a language

When I talked about my first Go experience, I mentioned in passing that a netcat-like program is actually not a bad test program for a language (or for certain sorts of libraries in, eg C). Today I feel like explaining that.

To start with, it's not an empty and artificial challenge; a netcat-like program does something meaningful and practical (although it may not be necessary if you already have netcat). The problem itself touches many levels of a language and its library, since it has to interact with standard input and output, deal with command line arguments, look up hostnames and ports, make network connections and talk over them, and deal with buffering and byte input and output. It also involves some level of network concurrency, either through real concurrency (as in Go) or through the equivalent with select(), poll(), or the like. There are also some subtle and taxing aspects to the problem, such as shutdown(), that test whether the language (and library) designers were paying attention or thought it worthwhile to expose the entire underlying system API.

(In a low-level language like C you'll also wind up exploring things like memory allocation and any safe buffer handling libraries that are available. If you're working with select() et al you can also extend the problem to playing around with nonblocking IO, again if the language gives you access to this.)

Of course there are many aspects to a language and its libraries beyond relatively low level networking, so this problem doesn't come anywhere near to exploring all of a language and its libraries. Still, I've found that it covers a lot of ground that's interesting to me personally and the whole experience is a good way of seeing what the language feels like.

Some people will want to write HTTP-based test programs instead because that's more directly relevant to them. I'm the kind of cynical person who wants to see the low-level plumbing in action too, partly because I think it's more revealing of the language's core attitudes. Since the web is so pervasive and important, my feeling is that everyone doing a new language environment is going to make sure they have good HTTP support (assuming they care about such usability at all). And if a language doesn't have either high-level HTTP support or good low-level networking support, well, that tells me a lot about its priorities.

NetcatGoodTest written at 23:03:16; Add Comment

Go: when I'd extend an interface versus making a new one

One of the reddit suggestions in response to my entry on using type assertions to reach through interfaces noted that you could embed one interface inside another one, effectively extending the interface that you embed, so my Closer interface could have been:

type ConnCloser interface {
    net.Conn
    CloseWrite() error
}

When I saw this my instinctive reaction was that this was wrong for my situation; since then I've spent some time thinking about why I feel that way. My conclusion is that I think I have good reasons but I may be wrong.

Simplifying, the dividing point for me is whether all of the values I'm dealing with would be instances of the new interface, for example if I was writing code that only dealt with TCP and Unix stream sockets. In that situation my life would be simpler if I immediately converted the net.Conn values into ConnCloser values and then had the rest of my code deal with the latter (freely calling .CloseWrite() when it wanted to). What I'm doing is converting net.Conn values into what they really are, which is values that have a wider interface.

But if not all of the values I'm dealing with are convertible and if I'm only doing the conversion in one spot (and only once), extending net.Conn doesn't feel like an accurate description of what I'm doing. I'm just fishing through it to see if I can call another routine and then immediately calling that routine. Using just an interface with CloseWrite() makes my actual intentions clear.

I'd feel different if I was passing the converted values around between functions or storing them in something. The issue here is that such functions don't really want to accept anything that simply has a CloseWrite() method with the right signature; they want to deal specifically with net.Conn values that also have that method. A bare Closer interface that only specifies a CloseWrite() method is too broad an allowance for what I actually mean and thus would be the wrong approach. (At this point I start waving my hands vaguely.)

The more I think about it the less I'm sure what proper Go style should be here, and I have to admit that part of my feelings against ConnCloser are based purely on it having another line that doesn't do anything in my original situation (I'm often a terseness person).

GoEmbeddingInterfacesWhen written at 01:55:16; Add Comment

2013-02-21

Go: using type assertions to safely reach through interface types

To start with, suppose that you have a Go net.Conn value, call it conn, that you want to shutdown() (for writing) on if possible. Some but not all specific concrete net connection types make this available as a .CloseWrite() method (eg it's available for TCP sockets but not for UDP ones), but net.Conn is an interface type and it doesn't include a .CloseWrite() method so you can't directly call conn.CloseWrite().

(In Go's software engineering view of the world this is a sensible choice. net.Conn is the set of interfaces that all connections can support. If you included .CloseWrite() in the interface anyways you would force some connections, eg UDP sockets, to implement a do-nothing or always-error version of the method and then people would write Go code that blindly called .CloseWrite() and expected it to always work.)

So sometimes conn will be of a concrete type that supports this (and sometimes it won't be). You want to somehow call .CloseWrite() if it's supported by your particular value (well, the particular concrete type of your particular value). In Python we would do this either with a hasattr() check or just by calling obj.CloseWrite() and catching AttributeError, but we're in Go and Go does things differently.

If you're a certain sort of beginning Go programmer coming from Python, you grind your teeth in irritation, look up just what concrete types support .CloseWrite(), and write the following brute force code using a type switch:

func shutdownWrite(conn net.Conn) {
    switch i := conn.(type) {
    case *net.TCPConn:
        i.CloseWrite()
    case *net.UnixConn:
        i.CloseWrite()
    }
}

(Then this code doesn't compile under Go 1.0 because net.UnixConn doesn't implement .CloseWrite() in Go 1.0.)

What this code is doing in its brute force way is changing the type of conn into something where we know that we can call .CloseWrite() and where the Go compiler will let us do so. The compiler won't let us directly call conn.CloseWrite() because .CloseWrite() is not part of the net.Conn interface, but it will let us call, say, net.TCPConn.CloseWrite(), because it is part of net.TCPConn's public methods. So if conn is actually a net.TCPConn value (well, a pointer to it) we can convert its type through this type switch and then make the call. Unfortunately this code has the great drawback that it has to specifically know which concrete types that sit behind net.Conn do and don't implement .CloseWrite(). This is bad for various reasons.

(I am mangling some Go details here in the interests of nominal clarity.)

The experienced Go programmers in the audience are shaking their heads sadly right now, because there is a more general and typesafe way to do this. We just need to say what we actually mean. First we need a type that will let us call .CloseWrite(); this has to be an interface type because we need to convert conn to it (somehow).

type Closer interface {
    CloseWrite() error
}

(It's important to get the argument and return types exactly right even if you're going to ignore the return value.)

Now we need to coerce conn to having that type if and only if this is possible; if we blindly coerce conn to this type (in one of a number of ways) we will get a runtime error when we're handed a net.Conn with a concrete type that lacks a .CloseWriter() method. In Go, this safe coercion is done with the two-result form of a type assertion:

func shutdownWrite(conn net.Conn) {
    v, ok := conn.(Closer)
    if ok {
        v.CloseWrite()
    }
}

(We can't just call conn.CloseWrite() after the coercion because we haven't changed the type of conn itself, we've just manufactured another variable, v, that has the right type.)

This is both typesafe and general. Any conn value of a concrete type that implements .CloseWrite() will work and it will work transparently, while if conn is of a concrete type that doesn't implement .CloseWrite() there are no runtime panics; all of this is exactly what we want. The same technique can be used in exactly the same way to reach through any interface type to get access to any (public) methods on the underlying concrete types; set up an interface type with the methods you want, try coercing, and then call things appropriately.

(I actually like this typesafe conversion and method access better than the Python equivalent because it feels less hacky and more a direct expression of what I want.)

I think that it follows that any type switch code of the first form, one where you just call the same routine (or a few routines) on the new types, is a danger sign of doing things the wrong way. You probably want to use interface type conversion instead.

(Had I read the right bit of Effective Go carefully I might have seen this right away, but Effective Go doesn't quite address this directly. All of this is probably obvious to experienced Go programmers.)

Update: there are several good ideas and improvements (and things I didn't know or realize) in the the golang reddit comments on this entry.

GoInterfacePunning written at 14:19:15; Add Comment

Some notes on my first experience with Go

I've finally wound up writing my first Go program. The program is a Go version of what seems to have turned into my standard language test program, namely a netcat-like program that takes standard input, sends it off to somewhere over the network, and writes to standard out what it gets back from the network. Partly because Go made it easy and partly due to an excess of new thing enthusiasm the program grew far beyond my initial basic specifications.

(I'm somewhat bemused but a netcat-like program really has become a standard program I write in new languages and to try out things like new buffering libraries. It's actually not a bad test.)

On the whole the experience was quite pleasant. The specific need I had is something I normally would have handled with a Python program and writing my Go program was not particularly much more work and bookkeeping than the Python equivalent would have been (it took much longer to write because I was semi-learning Go as I went and I already know Python). The code has reasonably few variable declarations and most of them are non-annoying; Go's := idiom really helps with this since it means that in many circumstances you don't have to declare a variable or specifically name its type.

One important thing I wish I'd know at the start is that you should ignore most everything the Go documentation overview pages tells you about what to read. Effective Go is in practice the quick guide to Go for C programmers, or at least for C programmers who have some general idea about Go to start with, and is the closest thing Go has to Python's excellent tutorial. The language reference is overly detailed and too hard to read for learning and the interactivity of the beginning tutorial makes it completely unsuitable for quick starts.

One of the reasons that I got as far as I did as fast as I did is that Go's networking library has a relatively high-level view of the world. There is no Python equivalent of Go's net.Dial() or net.Listen() APIs, at least not in the standard library; the existence of both of them made handling an absurdly wide variety of network protocols basically trivial (along with a bunch of complexity of hostname and port number lookups). On the flipside this API is not complete (especially in Go 1.0) and has a number of really annoying omissions. This is especially frustrating since I have the (Go) source for the net package and can see perfectly well that what I want access to already exists in the package; it's just not exported and (unlike Python) you can't fish into a package to grab stuff yourself.

My code wound up using goroutines and channels, although in a relatively basic way. Designing program flow in terms of channels definitely took several attempts before I had everything sorted out cleanly; earlier versions of the code had all sorts of oddities before I sorted out exactly what I wanted and how to express that in channel data flows. My broad takeaway from this experience is that it's very important to think carefully about what you want to do before you start eagerly designing a complex network of channels and goroutines. It was easy for me to get distracted by the latter and miss an obvious, relatively simple solution that was under my nose.

My feelings about channels and goroutines are mixed. On the one hand I think that using them simplified the logic of my code (and made it much easier to support TLS), even if it took a while to sort out that logic. On the other hand having to use goroutines is responsible for a serious wart in one aspect of the program, a wart I see no way around; the wart arises because there's no way for outside code to force a goroutine blocked in IO to gracefully abort that IO (this is a fundamental issue with channels).

This is rambling long enough as it is, so I think that I will save my language disagreements for another day. Well, except to say that I think that the standard Go package for parsing arguments and argument flags handles command line options utterly the wrong way and I need to get a real argument parsing package before I write another Go command.

(Go's standard flag package apparently follows some argument parsing standard that Google likes. It is pretty non-Unixy while looking just enough like normal Unix argument handling to fool you.)

GoFirstExperience written at 02:50:20; Add Comment

These are my WanderingThoughts
(About the blog)

GettingAround
Full index of entries
Recent comments

This is part of CSpace, and is written by ChrisSiebenmann.
Twitter: @thatcks

* * *

Atom feeds are available; see the bottom of most pages.

This is a DWiki.
(Help)

Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web

Search:
(Previous 10 or go back to February 2013 at 2013/02/19)

Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.