When I was tinkering with what goes on in one’s my brain while coding I needed a way to record my coding process for later analysis. My neuro tinkering is in hiatus, but the simple, standalone system I developed for logging the programming process in near near-real time (git-shadow) is available on GitHub.

In this post I’ll explain the tool, borrowing heavily from the README.

What is code shadowing?

Code shadowing is the process of recording a programmer’s coding in near-real-time. As the programmer inputs characters, code shadowing unobtrusively logs those characters with some sort of timestamp for later analysis, or perhaps realtime feedback.

These days I use vim and git a lot, so that is what git-shadow currently implements. However, code shadowing could be applied to just about any editor or VCS. In fact, collaborative and cloud-based editors that support operational transformation like firepad.io and nitrous.io are probably a more natural fit for code shadowing.

Note: I generally abhor useless buzzwords, so please believe when I say I’m only using the term code shadowing because I wasn’t able to find a succint, accurate term for the idea in prior art. If you have an idea for a better name for this process (or you have a reference that already labels it) please leave a comment or drop me a line.

Why do it?

Recording the coding process could potentially help answer a bunch of questions that I (and perhaps others) think are interesting about development:

How does the coding process of someone who is an expert in a language or framework differ from that of someone who is just learning it?
Can applying churn, ownership, and other fault prediction techniques to real-time coding data help programmers make fewer mistakes?
Can the data that major internet services collect on us be used to help us understand why we make mistakes in code?
Can emerging research in using quantified-self data to understand the coding process be applied in the real world?
Do I actually write code in a different way than the rockstars (Knuth, Linus, et al.)? What about programmers I really admire? What are the differences in our techniques, and how can I get better?

Transparently logging live-coding activity could help answer these and similar questions. git-shadow is a simple tool that aims to perform this type of logging and enable developers to start analyzing coding data using tools and methods that have already been built around git.

Fictional Example

Say a silly bug is found in my code during an internal code review, or worse…

goto fail

… and I decide to do some root-cause analysis with the help of git-shadow

Gratuitous Disclaimer: I had nothing to do with the real goto fail; bug, I’ve never worked for Apple, and I have no idea how the bug was actually introduced. IOW, it’s just an example.

1. Find the commit where the bug was injected using conventional methods

foote$ git log -S 'goto fail'
commit 7dba55fb8590f043afe935a9b366814fa5727804
Author: Jonathan Foote <jmfoote@loyola.edu>
Date:   Mon Jan 23 10:03:49 2014 -0500

    Fixed issue #PR59241

commit a4c55a248e8ad381d71466c0a8e3a477dfe5ac60
Author: Steve Jobs <steve@apple.com>
Date:   Fri June 11 14:00:55 2003 -0500

    Initial commit

I can see from the above pickaxe search that the only commits that added or removed a goto fail were the initial commit and commit 7dba55f... made by this shady Jonathan Foote character.

2. Find exact minute/second you made the mistake using `git-shadow`

$ git checkout 7dba55fb8590f043afe935a9b366814fa5727804
Note: checking out '7dba55fb8590f043afe935a9b366814fa5727804'.
[...]

HEAD is now at 7dba55f... Fixed issue #PR59241
flan:demo user0$ git shadow log -S 'goto fail'
commit 69136d46fe975e9b239de44d330eaba3d4593665
Author: Jonathan Foote <jmfoote@loyola.edu>
Date:   Fri Jan 20 23:12:54 2014 -0500

    'file_modified'

commit 38013a4f169e3e8d4c8208d9cf65507559c95f29
Author: Jonathan Foote <jmfoote@loyola.edu>
Date:   Thu Jan 19 14:12:00 2014 -0500

    '7dba55fb8590f043afe935a9b366814fa5727804'

The oldest shadow commit discovered above, 38013a4... is the verbatim shadow copy of the code created when I first started working on the PR59241. According to pickaxe, the only other shadow commit to modify goto fail was 69136d4... made at Fri Jan 20 23:12:54 2014 -0500. Looks like I was coding late at night when I made the mistake…

3. Query your big data using the fault injection time to do a root cause analysis

Oh. Yeah. Probably shouldn’t have done any programming that Friday night.

Note: Drunk tweets notwithstanding, querying something like fluxtream could provide some novel insight.

4. I change my habits to avoid making the same mistake again.

I use git-shadow to continuously improve my programming skills, becoming to envy of all my friends. After a few years of flawless programming, I retire as a rich philantropist.

How it works

When git shadow activate is invoked, a mirror of all files that are tracked in the current repo is created in <repo path>/.shadow. Hooks are added to the current repo to keep the shadow consistent with HEAD.

All of the shadow logic is implemented in a single python script (git-shadow). The example editor plugin is implemented in two simple vimscript files (vim-shadow/autoload/shadow.vim and vim-shadow/plugin/shadow.vim).

Coding

As you code in vim, the vim-shadow plugin periodically passes the contents of the active buffer to the git-shadow shadow-file command, which adds them to a shadow git repository inside the .shadow directory.

flow1

As commits are made to your codebase, git-shadow catalogues git repositories containing your coding activity in the .shadow directory by commit id. The .shadow directory contains a directory for each commit id that git-shadow has been active for, including current.

HEAD changes

When the user runs a checkout command, a hook placed in .git/hooks when the user ran the git shadow activate command deletes the existing .shadow/current and replaces it with the directory corresponding to the new HEAD if it exists.

flow3

The hook simply calls an incantation of git-shadow – all of the hook logic is contained in the git-shadow script. Note: This logic probably needs some work.

Analysis

Running git shadow <git cmd> simply runs the corresponding git command as if it were invoked from .shadow/current.

flow2

Now to smoke test it…

I am using git-shadow on another small research project to work out the kinks. All is going well so far – the system is completely transparent and unobtrusive, and my spot-checks of the logs look good. Barring any latent bugs, expect some additonal analysis in a future post.

If you decide to give the plugin a shot or have ideas on the subject please leave a comment or drop me a line. Thanks for reading.