Debuggers and two sorts of bugs
December 5, 2011
In the middle of reading Go Isn't C I ran across some remarks about how it was bad that programmers ignore debuggers in favour of print statements. This immediately sparked my standard reaction, which is that debuggers are focused on telling you what happens next but generally I want to know how on earth my code got into its current state. Then I started thinking some more and realized that I was being too strong because this isn't really accurate. In fact I use debuggers periodically, but only on certain sorts of bugs.
Let us say that there are two sorts of bugs. For lack of better names, I will call them direct bugs and indirect bugs. A direct bug's cause can be determined immediately by looking at the call stack, the local variables, the code, and so on at the time when it happens. You can say 'oh, the caller forgot that this function couldn't be called with a NULL', or see that you forgot to handle a case and something fell through to code that should never have been reached in this situation. A decent debugger works very well on direct bugs, or even features like automatic call stack backtraces on uncaught exceptions (as you get in languages like Python).
Indirect bugs are data structure corruption bugs (or sometimes flow of
control bugs), where you are now in a 'this can't happen' situation
(whether caught by an
(For practical examples, my recent Liferea issue was a direct bug; if I had read the code
carefully, the first stack backtrace would have shown me the problem.
My unconscious bias until now has been that direct bugs are uninteresting because they are easy to solve from basic inspection, so I only really thought about what I wanted to deal with indirect bugs. But the main reason that direct bugs are easy to deal with is that I already have tools like stack backtraces and inspection of local variables so it's easy to see what's wrong with the program's current state.
Sidebar: a hazard of dealing with indirect bugs
It's often popular to 'fix' an indirect bug that crashes the program or generates obviously bad results by making the code accept the impossible state; for example, by adding a NULL check to the low-level routine that's crashing with a NULL pointer exception. This is generally the wrong idea (you're treating the symptoms instead of the disease), but it's tempting as a quick fix and it's an easy approach to fall into if you don't understand the code well enough to know that you're dealing with an indirect bug, not a direct bug.
This is one of the issues that always makes me wary about fixing 'obvious' crash bugs, especially if I want to send the fix upstream. Before I add a NULL pointer check or the like I need to be sure that it's the real bug, and I need to understand what the code should do instead of crashing (which is not always obvious).
* * *
Atom feeds are available; see the bottom of most pages.