Goroutines, network IO streams, and the resulting synchronization problem

December 9, 2015

These days, I use my Go take on a netcat-like program for all of my casual network (TCP) service testing. One of the things that makes it convenient is that it supports TLS, so I can just do things like 'call tls imap.cs imaps' instead of having to dabble in yet another netcat-like program. However, this TLS support has an important and unfortunate limit right now, which is that it only supports protocols where you do TLS from the start. A certain number of protocols support a during-connection upgrade to TLS, generally via some form of what gets called STARTTLS; the one most relevant to me is SMTP.

I would like to support STARTTLS in call, but this exposes a structural issue. Call is a netcopy program and these are naturally asynchronous in that copying from standard input to the network is (and often should be) completely decoupled from copying from the network to standard output. Call implements this in the straightforward Go way by using two goroutines, one for each copy. This works quite well normally (and is the only real way to support arbitrary protocols), but the moment you add STARTTLS you have a problem.

The problem is that STARTTLS intrinsically requires the input and the output to synchronize (and really you need them to merge). The moment you send STARTTLS, the previously independent to-network and from-network streams must be synchronized and yoked together as TLS exchanges messages with the server at the other end of the connection. In a program based around poll() or select(), this is easy; you can trivially shift from asynchronous IO streams back to synchronous ones while you set up TLS, and then go back to asynchronous streams over the TLS connection. But in a goroutine based system, it's far from clear how to achieve this (especially efficiently), for two reasons.

The first problem is simply establishing synchronization between the sending and receiving sides in some reasonably elegant way. One approach is some sort of explicit variable for 'starting TLS' that is written by the sending side and checked at every IO by the receiving side. In theory another approach is to switch to a model with more goroutines, with one for every IO source and sink and a central manager of some sort. The explicit variable seems not very Go-y, while the central manager raises the issue of how to keep slow IO output to either the network or stdout from blocking the other direction.

The second problem is that mere synchronization is not enough, because there is no way to wake up on incoming network IO without consuming some of it. This matters because when we send our STARTTLS, we want to immediately switch network input handling from its normal mode to 'bring up TLS' mode, without reading any of the pending network input. If we cannot switch right away, we will wake up having .Read() some amount of the STARTTLS response, which must then be handled somehow.

(Hopefully all STARTTLS-enabled protocols have a server response in plaintext that comes before any TLS protocol messages. If the first thing we may get back from the network in response to our STARTTLS is part of the server TLS handshake, I'd have to somehow relay that initial network data to tls.Client() as (fake) network input. The potential headaches there are big.)

I don't have any answers here and I don't know what the best solution is; I just have my usual annoyance that Go forces asynchronous (network) IO into a purely goroutine based architecture.

(The clear pragmatic answer is to not even try to support STARTTLS in call and to leave that to other tools, at least until I wind up with a burning need for it. But that kind of annoys me.)

Written on 09 December 2015.
« A surprise about cryptographic signatures
The ArchLinux wiki has quietly become a good resource for me »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Wed Dec 9 23:19:00 2015
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.