2019-03-03
Understanding a change often requires understanding how the code behaves
Yesterday I wrote about how really understanding diffs requires knowing their context too. An additional part of the problem is that you can't understand most changes in isolation, just as a change. To actually understand a change, you usually need to be able to compare the before and after behavior of the code, and in order to do that you must understand (or work out) what that behavior is.
This is a big part of why you have to reconstruct those before and after versions of the code in your head as you're reading a diff. You need the ability to think 'okay, the old version does X and the new version does Y', and to do that you usually have to put the change into its context and then understand that context.
When you're familiar with the codebase, you generally already know how either the old version or the new version behaves in addition to having a mental image of the code itself. I suspect that this makes it much easier to both construct a mental image of the other side of the change and then understand what it does, and obviously you're already starting from a state where you know one side so you only need to do half the work.
(I might have thought to put this into yesterday's entry if I had written it at a different time. There's a story there, but that's for another entry.)
Sidebar: One trap in understanding changes is non-local effects
One bane of understanding changes even when you understand the code is that some changes can have non-local effects that aren't obvious. This spooky action at a distance often comes about because of either programming language behavior or subtle API changes (including especially implicit APIs in how one part of the code knows about how another part behaves). One famous example of this is an innocent looking change to the Linux kernel that caused a security issue, where adding a variable initialization silently removed a NULL check.
Of course this is nothing too novel. Removing spooky action at a distance in general is a big thrust of modern programming languages, because we've learned that it's dangerous basically all of the time.
Really understanding diffs requires knowing their context too
A lot of things like presenting changes in the form of diffs (generally unified diffs these days, fortunately). Some of the time when I'm reading these diffs in various contexts (eg), I've felt that I was struggling to fully understand what I was seeing. Today, I had a realization about this that feels completely obvious in retrospect, namely that understanding diffs relies on implicit contextual knowledge of their surrounding code. Most of the time, to understand the change that a diff is making you need to mentally reconstruct an image of the code being modified, in much the same way that indirect manipulation UIs require you to construct mental context.
When you already know the code well, this is easy; you can readily 'see' both the before and the after version of the code. But if you don't really know the code and are reading it cold, this is a much harder thing to do. You don't have a clear mental image of what's going on and it's much harder to see and reason about the effects of the change, unless they're obviously and strictly local ones.
Unified diffs provide some obvious context, but generally not all of the function or other entity that the change is embedded in. Even when they do show everything, a diff is a compact encoding of a change and properly understanding the change requires imagining and to some extent understanding the before and after versions. I'm not sure there's any clear way to present this inline in text; I think the most comprehensible way of showing diffs will always be side by side (ideally in a graphical environment).
Realizing this makes me understand why I often struggle trying to
make sense of a fair number of the diffs I read, because I'm often
reading changes in codebases that I don't know (sometimes even in
languages that I only partially understand). Probably I should see
if I can easily use tools like kdiff3
when I'm picking my way
through other people's code.