One limitation of simple bisection searches in version control systems
May 19, 2011
The usual description of VCS bisection is that it performs a binary search to determine the changeset that introduced a bug. Unfortunately, there is a fundamental limitation with this model of bisection; it only works well when the bug was only introduced once. Otherwise you will get results that are technically correct but not really useful.
Suppose what actually happened was that the bug was unknowingly introduced, accidentally fixed, and then later reintroduced. Doing a bisection will 'work' in that it will give you a changeset where the bug appeared, but what you really want is not just any old changeset where the bug appeared but the most recent changeset where it appeared. There is no guarantee that bisection will give you this changeset, because there is no guarantee that binary search will.
Binary search implicitly requires that the indexed set being searched is ordered; when the indexed set is not ordered, you get strange results. In cases like bisection where there are effectively only two numbers ('good' and 'bad'), having the set be ordered means that all of the 'bad' comes after the 'good' (or vice versa). If this is not the case, binary search will find one of the division points between good and bad but there are no guarantees of which one it is.
This situation is difficult to detect (although there are heuristics in extreme cases) and impossible to completely avoid or fix in anything that you actually want to use.
(One obvious heuristic is to see if the changeset has been completely overwritten by later changes. If it's not contributing any code to the current tree, it's not contributing any bugs either.)
PS: this is what I was thinking of yesterday.
* * *
Atom feeds are available; see the bottom of most pages.