The problem with indentation in programming languages

February 23, 2014

You may have heard that Apple has just released a really important security update to fix SSL aka TLS certificate verification. The best description of the bug I've seen comes from Adam Langley in Apple's SSL/TLS bug. The core source for the bug is, extracted from the function:

...
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    goto fail;
...

This points to the core problem with indentation. As we should all know, computer languages must communicate with two separate audiences; they're meaningful to both computers (who must execute them) and to programmers (who must read and modify them). The issue here is simple:

Indentation is semantically meaningful to people but is semantically meaningless to (most) compilers et al.

When I say that indentation is semantically meaningful to people I mean that people read things into indentation. When we're reading code we generally assume that the indentation matches the program block structure and thus that we can infer block structure from indentation level. This is a shorthand and like all shorthands it can be both clearly wrong (and thus discarded, slowing us down) or overridden if we read closely. But I think that history has shown that people can and will miss mismatches between indentation and actual program code flow and will miss bugs as a result.

(See for example Dave Jones' replies in this twitter conversation.)

I can think of two reasonably workable solutions to these; call them the Python and the Go solutions. The Python solution is to make indentation semantically meaningful even in the language, thereby making the programmer view and the computer view of the code match up. The problem is that this seems to be widely unpopular with programmers (for no good reason). The Go solution is to create a canonical formatting style for the language and then strongly encourage everyone to routinely convert their code to it. If you run code like this through gofmt it will correct the mis-indentation and thus give you a higher chance of spotting the problem. Make it a rule that all code must be canonicalized through gofmt before a change is committed and there you go.

(I don't think that 'produce warnings about mismatching indentation' is a winning solution, although I might be wrong. At the least I don't think any of the major C compilers have added such a warning. My gut's view is that warnings are going to be less effective than other measures.)

My personal opinion is that some variant of the Python solution is the right one. I don't object if you want to still have explicit block delimiters for various reasons, but I think it should be a fatal error if the indentation does not match the block structure. Add a tool like gofmt if you want to allow people to write sloppy code and then have it fixed automatically before they feed it to the real language environment.

PS: Yes, mismatching indentation is not the only problem with Apple's code here; two gotos in a row ought to look at least a little bit odd regardless of their indentation (and even if they were in the same block context, eg if they were both inside an 'if (..) { ... }').

Written on 23 February 2014.
« A subtle advantage of generating absolute path URLs during HTML rendering
The origins of DWiki and its drifting purpose »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Feb 23 01:54:13 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.