The problem with indentation in programming languages

February 23, 2014

You may have heard that Apple has just released a really important security update to fix SSL aka TLS certificate verification. The best description of the bug I've seen comes from Adam Langley in Apple's SSL/TLS bug. The core source for the bug is, extracted from the function:

...
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    goto fail;
...

This points to the core problem with indentation. As we should all know, computer languages must communicate with two separate audiences; they're meaningful to both computers (who must execute them) and to programmers (who must read and modify them). The issue here is simple:

Indentation is semantically meaningful to people but is semantically meaningless to (most) compilers et al.

When I say that indentation is semantically meaningful to people I mean that people read things into indentation. When we're reading code we generally assume that the indentation matches the program block structure and thus that we can infer block structure from indentation level. This is a shorthand and like all shorthands it can be both clearly wrong (and thus discarded, slowing us down) or overridden if we read closely. But I think that history has shown that people can and will miss mismatches between indentation and actual program code flow and will miss bugs as a result.

(See for example Dave Jones' replies in this twitter conversation.)

I can think of two reasonably workable solutions to these; call them the Python and the Go solutions. The Python solution is to make indentation semantically meaningful even in the language, thereby making the programmer view and the computer view of the code match up. The problem is that this seems to be widely unpopular with programmers (for no good reason). The Go solution is to create a canonical formatting style for the language and then strongly encourage everyone to routinely convert their code to it. If you run code like this through gofmt it will correct the mis-indentation and thus give you a higher chance of spotting the problem. Make it a rule that all code must be canonicalized through gofmt before a change is committed and there you go.

(I don't think that 'produce warnings about mismatching indentation' is a winning solution, although I might be wrong. At the least I don't think any of the major C compilers have added such a warning. My gut's view is that warnings are going to be less effective than other measures.)

My personal opinion is that some variant of the Python solution is the right one. I don't object if you want to still have explicit block delimiters for various reasons, but I think it should be a fatal error if the indentation does not match the block structure. Add a tool like gofmt if you want to allow people to write sloppy code and then have it fixed automatically before they feed it to the real language environment.

PS: Yes, mismatching indentation is not the only problem with Apple's code here; two gotos in a row ought to look at least a little bit odd regardless of their indentation (and even if they were in the same block context, eg if they were both inside an 'if (..) { ... }').


Comments on this page:

From 31.151.153.51 at 2014-02-23 04:42:24:

ahem, like Brian d Foy twits:

Buggy Apple iOS code illustrates why Perl always requires braces around if() statements

https://twitter.com/briandfoy_perl/status/437358148010397696

if C had required that, this indentation post would not have existed ;-)

By Guillaum at 2014-02-23 06:38:21:

31.151.153.51: I'm not sure that forcing the usage of curly braces totally fixes the issues, for example this piece of code, which is usually bugged because of the semi-colon:

for(...);
{
    indented code
}

The indentation is correct, the curly braces are here, but the code is still bugged ? And it may be difficult for the compiler to tell you the issue, looks for example this code:

 for(i = 0; i < 100; f(i++));
 {
     mutex_aquire_with_raii(mutex);

 }

Do you want to acquire the mutex and apply "f" 100 times, or just one mutex acquire and 100 f call ?

The reason is because curly braces in C serves two purposes, the scope and the "indent" of statement, and this is highly confusing.

I never used "scope" in C/C++ for anything else than RAII, but because the scope syntax looks so close to the "indent" syntax, I'm always missing the fact that RAII is used during my code reviews.

So why not expressing the RAII syntax using something explicit AND expressing the "indent" level with indentation, and we solve both issues at the same time.

The Go solution is to create a canonical formatting style for the language and then strongly encourage everyone to routinely convert their code to it.

Java also has an official (?) code convention guide.

For statements, they give:

The if-else class of statements should have the following form:

if (condition) {
   statements;
}

if (condition) {
   statements;
} else {
   statements;
}

if (condition) {
   statements;
} else if (condition) {
   statements;
} else {
   statements;
}
From 31.151.153.51 at 2014-02-23 10:48:40:

Guillaum, bugs will always exist, even with indentation (or are you guys suggesting python/go code has nog bugs?)

;-)

Yay, someone else who likes indentation being significant! My only beef is that delimiters suck, so I do not like allowing them even if they are optional; plus, you shouldn't need them anyway (and of course Python doesn't have them).

By cks at 2014-02-24 11:15:20:

David Magda: My view is that an ordinary style guide isn't sufficient because people can and will deviate from it. Where Go wins is having a way to automatically reformat all code to the style, so you can simply say 'all code must be run through gofmt before being committed'. It's the automated no-hassle enforcement that makes it work, reinforced by social norms which themselves come about from using gofmt all the time across most or all of the Go ecosystem.

By DanielMartin at 2014-02-25 16:53:59:

JetBrains used this issue as an excuse for a bit of marketing:

https://twitter.com/appcode/status/437896886649757696

Written on 23 February 2014.
« A subtle advantage of generating absolute path URLs during HTML rendering
The origins of DWiki and its drifting purpose »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sun Feb 23 01:54:13 2014
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.