I know I'm late to the party on this one, but I just used git
bisect for the first time today in order to hunt down where a
kernel bug started showing up. Since there are more kernel bugs in my future, here are some things that I want to
remember about the process and other random comments.
(I'll probably have more later when I do it again.)
First, I have to say that the whole process really is both cool and
addictive, and it works. It doesn't help the addictiveness that each
step usually gets faster and faster as you get closer and closer to
identifying the bad commit (because you are generally rebuilding less
and less).
Notes:
- I can't see an easy way to go back a step (in case you make a mistake
in '
git bisect [good|bad]'), so it might be a good idea to keep a
log of the start points and the steps. When I do this again, I'll
keep lab notes, and include the
commit IDs of each step.
Update: I'm wrong and there is a way to back up steps. In
comments, Sergey Vlasov pointed out git bisect log and associated
things.
(I was especially nervous about this because I was building the kernel
on one machine and testing on another, all while doing other things.)
- having the tree be unbuildable is irritating but it happens every
so often. If it does, the first thing to do is to stop and analyze
why it's broken and what's going on, not to blindly start doing
'
git bisect skip'. As it happens I got away with my first few
uses of skip (done hastily before I dug deeper), but I was lucky.
(Once I figured out the bad change, I was fortunately able to fix
it up by hand so that the tree could be built. Which was important,
because the first bad commit turned out to be in that otherwise
unbuildable section.)
- if I'm restricting the bisection to an area of the tree (and
possibly if not), the interwoven 'branch and merge' kernel
development means that a bunch of changes can show up basically
out of nowhere due to an out-of-area merge. Things like
'
git bisect visualize' are not great at helping sort this
out, because they restrict your view to just the area of the
tree that you're fixed on at the moment.
(In hindsight it might have been faster to clone the repository
and start an automated 'git bisect run' pass to find the change
that broke the build. Instead I did it by various flailing around
with 'git blame', 'gitk', and so on.)
gitk in a repository that's being bisected does somewhat odd
things. If I want gitk instead of git bisect visualize,
it's simpler to do it in a separate master repository.
- restricting the bisection to a narrow area of the tree is good
because it can speed things up a lot, but it's also potentially
dangerous since you're implicitly assuming that things broke
because of changes in that area of the tree, and this might not
be correct. This is probably especially a concern if, like me,
you're not all that familiar with kernel internals and are just
going on guesses like 'let's restrict things to
drivers/net/wireless
since it's a wireless card that broke'.
(I had a nervous moment when the tree stopped building,
because the breakage wasn't in the area that I was bisecting on.
That's what rubbed my nose into out of area code changes
and merges making them show up and so on. For bonus nervousness,
it was in net/wireless, which I had not previously been aware
of.)
- I would really like to be able to easily build 32-bit kernels on
64-bit machines.
Update: I'm wrong (apparently I'm too scarred by memories of
trying to build 32-bit RPMs on 64-bit Fedora). Sergey Vlasov
also pointed out 'make ARCH=i386 ...', which works fine.
- there must be a better solution to pushing kernels from a build
server to the target machine than
rsync'ing the entire kernel
build tree just so I can run 'make modules_install install'
on the target. I just have to find it.
(I built on a separate machine because the target machine was a
slow laptop. As it happens, all of our good servers to build on
are 64-bit, so I had to use a less than ideal 32-bit server.)
In this case it would have been possible to totally automate the kernel
testing (the laptop has Ethernet, and the wireless failure is easy to
observe from a script), but I'd have had to build an entire set of
scripts and sudo operations and so on and it just wasn't worth it for
this. However, if I do this regularly I should look into it, since a
by-hand git bisect can clearly totally eat all of my day.
(The actual work didn't take much time, but git bisect is by and large
fast enough that I was constantly being interrupted to do the next
by-hand thing.)