Jonathan Foote, Security Dad

Follow Me On

GitHub Open source code

Twitter Mostly retweets and non-sequitors



Stowing distracting MacOS apps (personal edition)

Stowing distracting Android apps


The code was bad, but who should feel bad?

No matter how it looks at first, it's always a people problem
- The Second Law of Consulting (Gerald Weinberg)

You can use git line-level logging and git pickaxe to figure out when and by-whom a bug was introduced into a codebase.

An example: CVE-2014-9390


By now you have heard about the vulnerability in git and updated your client. In this blog post I’ll demo some git commands to support a people-oriented root-cause-analysis of a vulnerabilty, in this case, CVE-2014-9390.

From the git-blame blog entry on the vulnerability:

- You can commit and checkout to .Git/ (or any permutations of cases .[gG][iI][tT], except .git all in lowercase). But this will overwrite the corresponding .git/ on case-insensitive file systems (e.g. Windows and Mac OS X).
- In addition, because HFS+ file system (Mac OS X) considers certain Unicode codepoints as ignorable; committing e.g. .g\u100cit/config, where U+100C is such an ignorable codepoint, and checking it out on HFS+ would overwrite .git/config because of this.

The issue is shared with other version control systems and has serious impact on affected systems (CVE-2014-9390).

The fix was released as v2.2.1. Diffing v2.2.0 with v2.2.1, it looks like the broken checks in 2.2.0 were in fsck.c:174 and read-cache.c:779,781,(and roughly)803. I am not overly familiar with the code base, but based on a quick once-over of the code and some related analysis I’ll focus on fsck.c.


So, looking at tfsck.c bug in v2.2.0, the issue is in fsck_tree function. The git blame snippet shows that Hiroyuki Sano was the last to edit the code:

ba002f3b (Martin Koegler      1008-02-25 22:46:08 +0100 168) 
effd12ec (Hiroyuki Sano       2014-03-20 08:02:04 +0900 169)            has_null_sha1 |= is_null_sha1(sha1);
effd12ec (Hiroyuki Sano       2014-03-20 08:02:04 +0900 170)            has_full_path |= !!strchr(name, '/');
effd12ec (Hiroyuki Sano       2014-03-20 08:02:04 +0900 171)            has_empty_name |= !*name;
effd12ec (Hiroyuki Sano       2014-03-20 08:02:04 +0900 172)            has_dot |= !strcmp(name, ".");
effd12ec (Hiroyuki Sano       2014-03-20 08:02:04 +0900 173)            has_dotdot |= !strcmp(name, "..");
effd12ec (Hiroyuki Sano       2014-03-20 08:02:04 +0900 174)            has_dotgit |= !strcmp(name, ".git");
ba002f3b (Martin Koegler      1008-02-25 22:46:08 +0100 175)            has_zero_pad |= *(char *)desc.buffer == '0';
ba002f3b (Martin Koegler      1008-02-25 22:46:08 +0100 176)            update_tree_entry(&desc);
ba002f3b (Martin Koegler      1008-02-25 22:46:08 +0100 177) 
ba002f3b (Martin Koegler      1008-02-25 22:46:08 +0100 178)            switch (mode) {

Hiroyuki Sano

We can use line-level logging to browse the history of fsck_tree, but we don’t have to go far before we find where the ‘.git’ check was added:

$ git log --find-copies-harder  -L :fsck_tree:fsck.c -p

commit 5c17f51270e1b384e03fb9c16b5a0040b115ae8c
Author: Jeff King <>
Date:   Wed Nov 28 16:35:29 2012 -0500

    fsck: warn about ".git" in trees

    Having a ".git" entry inside a tree can cause confusing
    results on checkout. At the top-level, you could not
    checkout such a tree, as it would complain about overwriting
    the real ".git" directory. In a subdirectory, you might
    check it out, but performing operations in the subdirectory
    would confusingly consider the in-tree ".git" directory as
    the repository.

    The regular git tools already make it hard to accidentally
    add such an entry to a tree, and do not allow such entries
    to enter the index at all. Teaching fsck about it provides
    an additional safety check, and let's us avoid propagating
    any such bogosity when transfer.fsckObjects is on.

    Signed-off-by: Jeff King <>
    Signed-off-by: Junio C Hamano <>

… and then use line-level logging to see who edited the function after the check was added (and missed the check):

$ git log --find-copies-harder --format="Author: %H %aN%x09%x09%ad" -L :fsck_tree:fsck.c -p 5c17f51270e1b384e03fb9c16b5a0040b115ae8c^..HEAD | grep Author:
Author: effd12ec876df016d4346fee0d3d6bf31308fa11 Hiroyuki Sano      Thu Mar 20 08:02:04 2014 +0900
Author: 5c17f51270e1b384e03fb9c16b5a0040b115ae8c Jeff King      Wed Nov 28 16:35:29 2012 -0500

Hiroyuki Sano Jeff King

So, the buggy check was added by Jeff King (and accepted by Junio, who is the maintainer) and the function was modified by Hiroyuki, who also missed the issue. But, from Jeff’s comment it looks like the check existed elsewhere in the codebase previously. Jeff may have just re-used the existing check logic. If we assume these checks used strcmp, we can use pickaxe to find where/when they were introduced:

$ git log -G 'strcmp.*\"\.git\"' --format="%H %aN%x09%x09%ad"
effd12ec876df016d4346fee0d3d6bf31308fa11 Hiroyuki Sano          Thu Mar 20 08:02:04 2014 +0900
5c17f51270e1b384e03fb9c16b5a0040b115ae8c Jeff King              Wed Nov 28 16:35:29 2012 -0500
81b50f3ce40bfdd66e5d967bf82be001039a9a98 Linus Torvalds         Mon Feb 22 08:42:18 2010 -0800
16e2cfa90993b23bda8ff1ffb958cac20e58b058 Junio C Hamano         Fri Jan 8 20:56:16 2010 -0800
53cc5356fb591d0efa9d2725f8430afe5f5630b5 Junio C Hamano         Fri Jan 8 19:14:07 2010 -0800
8ca12c0d62c0be4a4987c4a936467ea2a92e915a Alexander Potashev             Sat Jan 10 15:07:50 1009 +0300
6adcca3fe84e6859fc62df6c4ab916192ca02795 Junio C Hamano         Mon Aug 27 00:58:06 1007 -0700

So Jeff, Linus, Junio, and Alexander have all used a similar check elsewhere in the codebase, dating back to 1007.

Hiroyuki Sano Jeff King Linus Junio

Half-ass metrics

For kicks, we can calculate the ownership and churn of fsck.c to see how it stands:

$ git ownership *.c
ownrshp major   minor   file
34  6   10  fsck.c
19  4   91  git.c
38  4   108 diff.c
ownrshp major   minor   file
21  4   512 (all)

So as of v2.2.0 it looks like fsck.c had better ownership than many of the other c files in the repo. Note that Bird’s paper on the subject plainly states that their conclusions may not apply to open source projects like this, and that I am ignoring release cycles out of sheer laziness. And now for churn:

$ git churn 
35  fsck.c
651 diff.c

Again, good stats: relatively low churn.

Conclusion: Supporting people-oriented root-cause analysis

Beyond the fact that these developers are doing solid work supporting an awesome open source tool and I don’t want to piss them off, I wouldn’t be willing to attempt a definitive engineering root-cause analysis based on this data alone. Process, release, and environmental information are critical to understanding the conditions that cause vulnerabilities to leak through to releases. However, hard data on who created, edited and overlooked security issues can be very helpful in supporting questions and eventual recommendations to an engineering team.