git blame prints information regarding line-level changes to a file including author, time, and commit information.
$ git blame trollbox.py | head -n 3 496f919d (Jonathan Foote 2014-11-28 11:12:01 -0500 1) #!/usr/bin/env python 496f919d (Jonathan Foote 2014-11-28 11:12:01 -0500 2) 496f919d (Jonathan Foote 2014-11-28 11:12:01 -0500 3) import sys
git log -L was initially developed as a 2010 Google Summer of Code project designed to do this (ref):
Generally, the goal of this project is to: 1. 'git log -L' to trace multiple ranges from multiple files; 2. move/copy detect when we reach the end of some lines(where lines are added from scratch). And now, we have supports in detail: 1. 'git log -L' can trace multiple ranges from multiple files; 2. we support the same syntax with 'git blame' '-L' options; 3. we integrate the 'git log -L' with '--graph' options with parent-rewriting to make the history looks better and clear; 4. move/copy detect is in its half way. We get a nearly workable version of it, and now it is in a phrase of refactor, so in the scope of GSoC, move/copy detect only partly complete.
Since then the logic has reached the mainline and is now available in recent versions of git. From
git log --help:
-L <start>,<end>:<file>, -L :<regex>:<file> Trace the evolution of the line range given by "<start>,<end>" (or the funcname regex <regex>) within the <file>. You may not give any pathspec limiters. This is currently limited to a walk starting from a single revision, i.e., you may only give zero or one positive revision arguments. You can specify this option more than once. <start> and <end> can take one of these forms: o number If <start> or <end> is a number, it specifies an absolute line number (lines count from 1). o /regex/ This form will use the first line matching the given POSIX regex. If <start> is a regex, it will search from the end of the previous -L range, if any, otherwise from the start of file. If <start> is "^/regex/", it will search from the start of file. If <end> is a regex, it will search starting at the line given by <start>. o +offset or -offset This is only valid for <end> and will specify a number of lines before or after the line given by <start>. If ":<regex>" is given in place of <start> and <end>, it denotes the range from the first funcname line that matches <regex>, up to the next funcname line. ":<regex>" searches from the end of the previous -L range, if any, otherwise from the start of file. "^:<regex>" searches from the start of file.
The function-level tracking feature seems pretty cool. It could be lack of experience with regular expressions outside of PCRE and Python, but it seems like this feature uses some one-off logic before handing the input string to the regex parser. I googled to find the test files for
-L and goofed around a bit. It looks like only lines that start with non-whitespace in a target code file are considered.
$ git log -L :class:trollbox.py commit f339b3ce300280042d03d74be47097c470219179 [...snip...] @@ -13,151 +13,154 @@ class MainWindow(QMainWindow): [...snip...]
And this doesn’t:
$ git log -L ': def download:trollbox.py' fatal: -L parameter ' def download' starting at line 1: no match
Examples are from trollbox.
John Firebaugh wrote a good summary of these and other techniques in 2012. Some of the features no longer exist, but the article remains a good roundup. There is a useful tip on using
:Gblame for manual analysis in the comments section.
The logic that determines whether or not a line is a function appears to reside here: line-range.c line 128