Really understanding diffs requires knowing their context too

March 3, 2019

A lot of things like presenting changes in the form of diffs (generally unified diffs these days, fortunately). Some of the time when I'm reading these diffs in various contexts (eg), I've felt that I was struggling to fully understand what I was seeing. Today, I had a realization about this that feels completely obvious in retrospect, namely that understanding diffs relies on implicit contextual knowledge of their surrounding code. Most of the time, to understand the change that a diff is making you need to mentally reconstruct an image of the code being modified, in much the same way that indirect manipulation UIs require you to construct mental context.

When you already know the code well, this is easy; you can readily 'see' both the before and the after version of the code. But if you don't really know the code and are reading it cold, this is a much harder thing to do. You don't have a clear mental image of what's going on and it's much harder to see and reason about the effects of the change, unless they're obviously and strictly local ones.

Unified diffs provide some obvious context, but generally not all of the function or other entity that the change is embedded in. Even when they do show everything, a diff is a compact encoding of a change and properly understanding the change requires imagining and to some extent understanding the before and after versions. I'm not sure there's any clear way to present this inline in text; I think the most comprehensible way of showing diffs will always be side by side (ideally in a graphical environment).

Realizing this makes me understand why I often struggle trying to make sense of a fair number of the diffs I read, because I'm often reading changes in codebases that I don't know (sometimes even in languages that I only partially understand). Probably I should see if I can easily use tools like kdiff3 when I'm picking my way through other people's code.

Comments on this page:

Another instance of this issue – the one I run into most frequently – is that without having the entire codebase in your head, you cannot tell whether a patch is comprehensive, since the diff can only show the changes that were actually made. If it is incomplete because there are other parts of the code that needed to be changed in tandem with the changes made in the patch, you can only tell that from the patch that by knowing the codebase and knowing that these other parts exist.

That can be a real headache with large-scale refactorings in long-standing periodically-rebased branches…

That's a very good observation.

In the past when I wanted to have someone look over a change but the default unified diff looked too complicated/messy/unclear/etc., I would increase the amount of context displayed:

$ hg diff -U20
$ git diff -U20

It makes the diff much larger, but the large amount of context helps quite a bit. It still doesn't help when the reader is unfamiliar with the codebase, but no amount of context is going to help with that (that is, it is unreasonable to expect the reader to read the whole codebase to make sure all nuance in the diff is understood).

Written on 03 March 2019.
« What you get when you do a DNS A record lookup for a CNAME'd name
Understanding a change often requires understanding how the code behaves »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Mar 3 03:58:09 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.