2018-08-14
Go's net package doesn't have opaque errors, just undocumented ones
I continue to be irritated by how opaque important Go errors are. I should not have to do string comparisons to discover that my network connection failed due to 'host is unreachable'.
The standard library net
package
has a general error type
that's returned from most network operations. If you read through
the package documentation straightforwardly, as I did in this tweet,
you will likely conclude that the only reasonable way to see if
your net.Dial()
call to something has failed because your Unix
is reporting 'no route to host' is to perform a string match against
the string value of the error you get back.
(You want to do that string match against net.OpError.Err
, since
that's what gets you the constant error string without varying bits
like the remote host and port you're trying to connect to.)
As I discovered when I started digging into things in the process
of writing a different version of this entry, things are somewhat
more structured under the hood. In fact the error that you get back
from net.Dial()
is likely to be all officially exported types and
you can do a more precise check than string comparisons (at least
on Unix), but you have to reach through several layers to see what
is going on. It goes like this:
net.Dial()
is probably returning a*net.OpError
, which wraps another error that is stored in its.Err
field.- if you have a connection failure (or some other specific OS level
error), the
*net.OpError.Err
value is probably an*os.SyscallError
. This is itself a wrapper around an underlying error, in.Err
(and the syscall that failed is in.Syscall
; you could verify that it's"connect"
). - this underlying error is probably a
*syscall.Errno
, which can be compared against the variousE*
errno constants that are also defined insyscall
. Here, I'd want to check forEHOSTUNREACH
.
So we have a *syscall.Errno
inside an *os.SyscallError
inside a *net.OpError
. This wrapping sequence is not documented
and thus not covered by any compatibility guarantees (neither is
the string comparison, of course). Since all of these .Err
fields
are declared as type error
instead of concrete types, unwrapping
the whole nesting requires a bunch of checked type casts.
If I was doing this regularly, I would probably bother to write a function to check 'is this errno <X>', or perhaps a list of errnos. As a one-off check, I don't feel particularly guilty about doing the string check even now that I know it's possible to get the specific details if you dig hard enough. Pragmatically it works just as well, it's probably just as reliable, and it's easier.
(You still need to do a checked type cast to *net.OpError
, but
that's as far as you need to go. If you don't even want to bother
with that, you could just string-ify the whole error and then use
strings.HasSuffix()
.
For my purposes I wanted to check some other parts of the
*net.OpError
, so I needed the type cast anyway.)
In my view, the general shape of this sequence of wrapped errors should be explicitly documented. Like it or not, the relative specifics of network errors are something that people care about in the real world, so they are going to go digging for this information one way or another, and I at least assume that Go would prefer we unwrap things to check explicitly rather than just string-ifying errors and matching strings. If there are cautions about future compatibility or present variations in behavior, document them explicitly so that people writing Go programs know what to look out for.
(Like it or not, the actual behavior of things creates a de facto standard, especially if you don't warn people away. Without better information, people will code to what the dominant implementation actually does, with various consequences if this ever changes.)
Our problem with HTTPS and user-created content
We have a departmental web server, where people can host their personal pages (eg) and pages for their research groups and so on, including user-run web servers behind reverse proxies. In other words, this web server has a lot of content, created by a lot of people, and essentially none of it is under our control. These days, in one sense this presents us with a bit of a problem.
Our departmental web server supports HTTPS (and has for years). Recent browser developments are clearly pushing websites from HTTP to HTTPS, even if perhaps not as much as has been heralded, and so it would be good if we were to actively switch over. But, well, there's an obvious problem for us, and the name of that problem is mixed content. A not insignificant number of pages on our web server refer to resources like CSS stylesheets using explicit HTTP URLs (either local ones or external ones), and so would and do break if loaded over HTTPS, where browsers generally block mixed content.
We are obviously not going to break user web pages just because the Internet would now kind of like to see us using HTTPS instead of HTTP; if we even proposed doing that, the users would get very angry at us. Nor is it feasible to get users to audit and change all of their pages to eliminate mixed content problems (and from the perspectives of many users, it would be make-work). The somewhat unfortunate conclusion is that we will never be able to do a general HTTP to HTTPS upgrade on our departmental web server, including things like setting HSTS. Some of the web server's content will always be in the long tail of content that will never migrate to HTTPS and will continue to be HTTP content for years to come.
(Yes, CSP has upgrade-insecure-requests, but that only helps for local resources, not external ones.)
Probably this issue is confronting anyone with significant amounts of user-created content, especially in situations where people wrote raw HTML, CSS, and so on. I suspect that a lot of these sites will stay HTTPS-optional for plenty of time to come.
(Our users can use a .htaccess
to force HTTP to HTTPS redirection
for their own content, although I don't expect very many people to
ever do that. I have set this up for my pages, partly just to make sure that
it worked properly, but I'm not exactly a typical person here.)
(This elaborates on an old tweet of mine, and I covered the 'visual noise' bit in this entry.)