Some notes to myself on 'git log -G' (and sort of on -S)
Today I found myself nerd-sniped by a bit in Golang is evil on shitty networks (via), and wanted to know where a particular behavior was added in Go's network code. The article conveniently identified the code involved, so once I found the source file all I theoretically needed to do was trace it back in history. Until recently, my normal tool for this is Git's 'blame' view and mode, often on Github because Github has a convenient 'view git-blame just before this commit', which makes it easy to step back in history. Unfortunately in this case, the source code had been reorganized and moved around repeatedly, so this wasn't easy.
Instead, I turned to ''git log
-G
', which I'd recently used to answer a similar question in the
ZFS code base. 'git log -G
<thing>
'
and the somewhat similar 'git log -S <thing>
'
search for '<thing>' in commits (in different ways). This time
around, I used plain 'git log -G <thing>
' and then got the full
details of likely commits with 'git show <id>
'. A generally better
option is 'git log -G <thing> -p
', which
includes the diff for a commit but by default shows you only the
file (or files) where the thing shows up in the changes (per
gitdiffcore's pickaxe section).
(In this case I had to iterate git-log a few times, because the implementation changed. The answer turns out to be it's been there since the beginning, which I could have found out by reading the other comments.)
'Git log -G' is not exactly the fastest thing in the world, which isn't surprising since it has to generate diffs for all changesets in order to look at things. It's single-threaded, unsurprisingly, and generally CPU bound in my testing. This implies that if I'm probing for multiple things at once on a SMP machine (which is the usual case), it's to my benefit to run multiple git-logs at once in different windows. On sufficiently large repositories it's probably also disk IO bound, although that will depend a lot on the storage involved. Because of this, it seems that it be useful to trim down what file paths Git considers, if you good confidence of where relevant files both are now and were in the past.
'Git log -S' is subtly different from 'git log -G' in a way that may make it less useful than you expect, depending on the repository. As covered in the git-log manual page, the -S option specifically includes binary files as well, while -G implicitly excludes them because they don't normally create patch text (binary files can be included with --text if you really want). In this case the identifier name I was looking for also appeared in some binary files of test data, so some of the commits reported by 'git log -S' puzzled me until I realized that they were being included because they added new versions of the test data, which meant that the number of instances of the identifier name had gone up.
(There may be some way to make -S not search binary files, but if so I couldn't find it when looking through the git-log manual page.)
If I'm hunting for when something was introduced or removed and I'm sure that the repository has no binary files to confuse me, using 'git log -p -S' is probably safe. If there are binary files around to annoy me, I'm probably pragmatically better off using 'git log -p -G'. Using -G also means that I'll spot changes in how something is used, for example making a function call conditional or not conditional (which I believe won't normally show up in -S). Probably my life is better if I standardize on first using 'git log -G' and then switching to -S if I'm getting too many code motion commits.
Comments on this page:
|
|