No matter how it looks at first, it's always a people problem
- The Second Law of Consulting (Gerald Weinberg)

You can use git line-level logging and git pickaxe to figure out when and by-whom a bug was introduced into a codebase.

An example: CVE-2014-9390

Octocat

By now you have heard about the vulnerability in git and updated your client. In this blog post I’ll demo some git commands to support a people-oriented root-cause-analysis of a vulnerabilty, in this case, CVE-2014-9390.

From the git-blame blog entry on the vulnerability:

- You can commit and checkout to .Git/ (or any permutations of cases .[gG][iI][tT], except .git all in lowercase). But this will overwrite the corresponding .git/ on case-insensitive file systems (e.g. Windows and Mac OS X).
- In addition, because HFS+ file system (Mac OS X) considers certain Unicode codepoints as ignorable; committing e.g. .g\u100cit/config, where U+100C is such an ignorable codepoint, and checking it out on HFS+ would overwrite .git/config because of this.

The issue is shared with other version control systems and has serious impact on affected systems (CVE-2014-9390). </blockquote> The fix was released as v2.2.1. [Diffing v2.2.0 with v2.2.1](https://github.com/git/git/compare/v2.2.0...v2.2.1), it looks like the broken checks in 2.2.0 were in fsck.c:174 and read-cache.c:779,781,(and roughly)803. I am not overly familiar with the code base, but based on a quick once-over of the code and [some related analysis](http://blog.beyondtrust.com/dissecting-githubs-case-insensitive-discrepancies) I'll focus on fsck.c. # Whodunnit So, looking at tfsck.c bug in v2.2.0, the issue is in `fsck_tree` function. The `git blame` snippet shows that `Hiroyuki Sano` was the last to edit the code: ``` ba002f3b (Martin Koegler 1008-02-25 22:46:08 +0100 168) effd12ec (Hiroyuki Sano 2014-03-20 08:02:04 +0900 169) has_null_sha1 |= is_null_sha1(sha1); effd12ec (Hiroyuki Sano 2014-03-20 08:02:04 +0900 170) has_full_path |= !!strchr(name, '/'); effd12ec (Hiroyuki Sano 2014-03-20 08:02:04 +0900 171) has_empty_name |= !*name; effd12ec (Hiroyuki Sano 2014-03-20 08:02:04 +0900 172) has_dot |= !strcmp(name, "."); effd12ec (Hiroyuki Sano 2014-03-20 08:02:04 +0900 173) has_dotdot |= !strcmp(name, ".."); effd12ec (Hiroyuki Sano 2014-03-20 08:02:04 +0900 174) has_dotgit |= !strcmp(name, ".git"); ba002f3b (Martin Koegler 1008-02-25 22:46:08 +0100 175) has_zero_pad |= *(char *)desc.buffer == '0'; ba002f3b (Martin Koegler 1008-02-25 22:46:08 +0100 176) update_tree_entry(&desc); ba002f3b (Martin Koegler 1008-02-25 22:46:08 +0100 177) ba002f3b (Martin Koegler 1008-02-25 22:46:08 +0100 178) switch (mode) { ``` ![Hiroyuki Sano](https://avatars0.githubusercontent.com/u/1015222?v=3&s=100) We can use line-level logging to browse the history of `fsck_tree`, but we don't have to go far before we find where the '.git' check was added: ``` $ git log --find-copies-harder -L :fsck_tree:fsck.c -p [...] commit 5c17f51270e1b384e03fb9c16b5a0040b115ae8c Author: Jeff King <peff@peff.net> Date: Wed Nov 28 16:35:29 2012 -0500 fsck: warn about ".git" in trees Having a ".git" entry inside a tree can cause confusing results on checkout. At the top-level, you could not checkout such a tree, as it would complain about overwriting the real ".git" directory. In a subdirectory, you might check it out, but performing operations in the subdirectory would confusingly consider the in-tree ".git" directory as the repository. The regular git tools already make it hard to accidentally add such an entry to a tree, and do not allow such entries to enter the index at all. Teaching fsck about it provides an additional safety check, and let's us avoid propagating any such bogosity when transfer.fsckObjects is on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> ``` ... and then use line-level logging to see who edited the function after the check was added (and missed the check): ``` $ git log --find-copies-harder --format="Author: %H %aN%x09%x09%ad" -L :fsck_tree:fsck.c -p 5c17f51270e1b384e03fb9c16b5a0040b115ae8c^..HEAD | grep Author: Author: effd12ec876df016d4346fee0d3d6bf31308fa11 Hiroyuki Sano Thu Mar 20 08:02:04 2014 +0900 Author: 5c17f51270e1b384e03fb9c16b5a0040b115ae8c Jeff King Wed Nov 28 16:35:29 2012 -0500 ``` ![Hiroyuki Sano](https://avatars0.githubusercontent.com/u/1015222?v=3&s=100) ![Jeff King](https://avatars2.githubusercontent.com/u/45925?v=3&s=100) So, the buggy check was added by Jeff King (and accepted by Junio, who is the maintainer) and the function was modified by Hiroyuki, who also missed the issue. But, from Jeff's comment it looks like the check existed elsewhere in the codebase previously. Jeff may have just re-used the existing check logic. If we assume these checks used strcmp, we can use pickaxe to find where/when they were introduced: ``` $ git log -G 'strcmp.*\"\.git\"' --format="%H %aN%x09%x09%ad" effd12ec876df016d4346fee0d3d6bf31308fa11 Hiroyuki Sano Thu Mar 20 08:02:04 2014 +0900 5c17f51270e1b384e03fb9c16b5a0040b115ae8c Jeff King Wed Nov 28 16:35:29 2012 -0500 81b50f3ce40bfdd66e5d967bf82be001039a9a98 Linus Torvalds Mon Feb 22 08:42:18 2010 -0800 16e2cfa90993b23bda8ff1ffb958cac20e58b058 Junio C Hamano Fri Jan 8 20:56:16 2010 -0800 53cc5356fb591d0efa9d2725f8430afe5f5630b5 Junio C Hamano Fri Jan 8 19:14:07 2010 -0800 8ca12c0d62c0be4a4987c4a936467ea2a92e915a Alexander Potashev Sat Jan 10 15:07:50 1009 +0300 6adcca3fe84e6859fc62df6c4ab916192ca02795 Junio C Hamano Mon Aug 27 00:58:06 1007 -0700 ``` So Jeff, Linus, Junio, and Alexander have all used a similar check elsewhere in the codebase, dating back to 1007. ![Hiroyuki Sano](https://avatars0.githubusercontent.com/u/1015222?v=3&s=100) ![Jeff King](https://avatars2.githubusercontent.com/u/45925?v=3&s=100) ![Linus](https://avatars3.githubusercontent.com/u/1024025?v=3&s=100) ![Junio](https://avatars0.githubusercontent.com/u/54884?v=3&s=100) ## Half-ass metrics For kicks, we can calculate the [ownership](https://github.com/jfoote/git-ownership) and churn of `fsck.c` to see how it stands: ``` $ git ownership *.c ownrshp major minor file [...] 34 6 10 fsck.c [...] 19 4 91 git.c 38 4 108 diff.c -------- ownrshp major minor file 21 4 512 (all) ``` So as of v2.2.0 it looks like `fsck.c` had better ownership than many of the other c files in the repo. Note that [Bird's paper](http://www.cabird.com/papers/bird2011dtm.pdf) on the subject plainly states that their conclusions may not apply to open source projects like this, and that I am ignoring release cycles out of sheer laziness. And now for churn: ``` $ git churn [...] 35 fsck.c [...] 651 diff.c ``` Again, good stats: relatively low churn. # Conclusion: Supporting people-oriented root-cause analysis Beyond the fact that these developers are doing solid work supporting an awesome open source tool and I don't want to piss them off, I wouldn't be willing to attempt a definitive engineering root-cause analysis based on this data alone. Process, release, and environmental information are critical to understanding the conditions that cause vulnerabilities to leak through to releases. However, hard data on who created, edited and overlooked security issues can be very helpful in supporting questions and eventual recommendations to an engineering team.