Code alone can tell you the what but it cannot tell you why

August 14, 2016

It all started when John Arundel tweeted:

@hlship: I completely don't buy into "code is self documenting". Code needs docs to explain the why; code by itself is only how.

I think code should be so clear, simple, and straightforward as to need the fewest possible comments, ideally zero.

and then, in a followup tweet:

I'm not saying you shouldn't comment your code. I'm saying you should code so that you don't need to.

I very strongly believe that this is impossible. As @hlship says, code can say what it is doing but it cannot by itself tell you why it is necessary to do that thing, or why you don't want to do that thing in another way, and so on. To write code that does communicate that information, you must effectively embed documentation in the form of names for things (and then you must hope that everything is sufficiently clear to convey your meaning).

Here, let me give a concrete example. My Go SMTP server package contains the following little snippet at the start of the function that parses SMTP commands:

if !isall7bit([]byte(line)) {
   res.Err = "command contains non 7-bit ASCII"
   return res
}

I think that the what and how of this code is reasonably clear and doesn't need any comments on what it is doing. But the why is completely opaque. Are we rejecting lines with non-ASCII characters because we are being RFC-picky? Could we take this code out in order to have a SMTPUTF8 compatible server? If we wanted a SMTPUTF8 compatible server, what other changes would be required, if any?

As it happens this snippet has an important 'why' attached to it. My comment in the actual source is not clear, but the reason for this check is that I later convert the entire line to upper case in order to make matching SMTP commands easier, and then use indexes into the upper-case version of the line to extract things from the original version of the line. Go considers all strings to be UTF-8 by default, so case conversion is done in Unicode, and Unicode case conversion can change how many Unicode characters a string has. When my code use indexes from the upper case string with the original string, it implicitly assumes that this doesn't happen.

(I also care about RFC compliance, which is a secondary reason.)

Could you write code that did something similar to this check and was clear about the why? Perhaps. But I think it would require either weird function names or structuring the code differently, for example by upper-casing the line and then insisting that it had the same length as the original version.

(The other option is to completely restructure the command matching code so that it works in a different way and doesn't care about this. Would that be better? Maybe. You might still want to be RFC picky here, instead of implicitly supporting SMTPUTF8.)

Would such code be 'simple' and 'straightforward'? I suspect not, although simplicity is at least partly in the eyes of the beholder. It would certainly have taken me longer to write than the current approach.

(None of this is new, and it's quite similar to what I've written about writing comments in your configuration settings, and there's documenting why you don't do things, and how procedures are not documentation and undoubtedly others.)

Written on 14 August 2016.
« What I did to set up a wireless network and what I have left to do
Some options for reindenting (some of) my existing Python code »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Aug 14 00:26:34 2016
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.