Code Shadowing - Recording programming in near-real-time
When I was tinkering with what goes on in one’s my brain while coding I needed a way to record my coding process for later analysis. My neuro tinkering is in hiatus, but the simple, standalone system I developed for logging the programming process in near near-real time (git-shadow
) is available on GitHub.
In this post I’ll explain the tool, borrowing heavily from the README.
What is code shadowing?
Code shadowing is the process of recording a programmer’s coding in near-real-time. As the programmer inputs characters, code shadowing unobtrusively logs those characters with some sort of timestamp for later analysis, or perhaps realtime feedback.
These days I use vim
and git
a lot, so that is what git-shadow
currently implements. However, code shadowing could be applied to just about any editor or VCS. In fact, collaborative and cloud-based editors that support operational transformation like firepad.io and nitrous.io are probably a more natural fit for code shadowing.
Note: I generally abhor useless buzzwords, so please believe when I say I’m only using the term code shadowing because I wasn’t able to find a succint, accurate term for the idea in prior art. If you have an idea for a better name for this process (or you have a reference that already labels it) please leave a comment or drop me a line.
Why do it?
Recording the coding process could potentially help answer a bunch of questions that I (and perhaps others) think are interesting about development:
- How does the coding process of someone who is an expert in a language or framework differ from that of someone who is just learning it?
- Can applying churn, ownership, and other fault prediction techniques to real-time coding data help programmers make fewer mistakes?
- Can the data that major internet services collect on us be used to help us understand why we make mistakes in code?
- Can emerging research in using quantified-self data to understand the coding process be applied in the real world?
- Do I actually write code in a different way than the rockstars (Knuth, Linus, et al.)? What about programmers I really admire? What are the differences in our techniques, and how can I get better?
Transparently logging live-coding activity could help answer these and similar questions. git-shadow
is a simple tool that aims to perform this type of logging and enable developers to start analyzing coding data using tools and methods that have already been built around git
.
Fictional Example
Say a silly bug is found in my code during an internal code review, or worse…
… and I decide to do some root-cause analysis with the help of git-shadow
Gratuitous Disclaimer: I had nothing to do with the real goto fail;
bug, I’ve never worked for Apple, and I have no idea how the bug was actually introduced. IOW, it’s just an example.
1. Find the commit where the bug was injected using conventional methods
foote$ git log -S 'goto fail'
commit 7dba55fb8590f043afe935a9b366814fa5727804
Author: Jonathan Foote <jmfoote@loyola.edu>
Date: Mon Jan 23 10:03:49 2014 -0500
Fixed issue #PR59241
commit a4c55a248e8ad381d71466c0a8e3a477dfe5ac60
Author: Steve Jobs <steve@apple.com>
Date: Fri June 11 14:00:55 2003 -0500
Initial commit
I can see from the above pickaxe search that the only commits that added or removed a goto fail
were the initial commit and commit 7dba55f...
made by this shady Jonathan Foote
character.
2. Find exact minute/second you made the mistake using git-shadow
$ git checkout 7dba55fb8590f043afe935a9b366814fa5727804
Note: checking out '7dba55fb8590f043afe935a9b366814fa5727804'.
[...]
HEAD is now at 7dba55f... Fixed issue #PR59241
flan:demo user0$ git shadow log -S 'goto fail'
commit 69136d46fe975e9b239de44d330eaba3d4593665
Author: Jonathan Foote <jmfoote@loyola.edu>
Date: Fri Jan 20 23:12:54 2014 -0500
'file_modified'
commit 38013a4f169e3e8d4c8208d9cf65507559c95f29
Author: Jonathan Foote <jmfoote@loyola.edu>
Date: Thu Jan 19 14:12:00 2014 -0500
'7dba55fb8590f043afe935a9b366814fa5727804'
The oldest shadow commit discovered above, 38013a4...
is the verbatim shadow copy of the code created when I first started working on the PR59241
. According to pickaxe, the only other shadow commit to modify goto fail
was 69136d4...
made at Fri Jan 20 23:12:54 2014 -0500
. Looks like I was coding late at night when I made the mistake…
3. Query your big data using the fault injection time to do a root cause analysis
Oh. Yeah. Probably shouldn’t have done any programming that Friday night.
Note: Drunk tweets notwithstanding, querying something like fluxtream could provide some novel insight.
4. I change my habits to avoid making the same mistake again.
I use git-shadow
to continuously improve my programming skills, becoming to envy of all my friends. After a few years of flawless programming, I retire as a rich philantropist.
How it works
When git shadow activate
is invoked, a mirror of all files that are tracked in the current repo is created in <repo path>/.shadow
. Hooks are added to the current repo to keep the shadow consistent with HEAD.
All of the shadow logic is implemented in a single python script (git-shadow
). The example editor plugin is implemented in two simple vimscript files (vim-shadow/autoload/shadow.vim
and vim-shadow/plugin/shadow.vim
).
Coding
As you code in vim, the vim-shadow
plugin periodically passes the contents of the active buffer to the git-shadow shadow-file
command, which adds them to a shadow git repository inside the .shadow
directory.
As commits are made to your codebase, git-shadow
catalogues git repositories containing your coding activity in the .shadow
directory by commit id. The .shadow
directory contains a directory for each commit id that git-shadow
has been active for, including current
.
HEAD changes
When the user runs a checkout
command, a hook placed in .git/hooks
when the user ran the git shadow activate
command deletes the existing .shadow/current
and replaces it with the directory corresponding to the new HEAD
if it exists.
The hook simply calls an incantation of git-shadow
– all of the hook logic is contained in the git-shadow
script. Note: This logic probably needs some work.
Analysis
Running git shadow <git cmd>
simply runs the corresponding git command as if it were invoked from .shadow/current
.
Now to smoke test it…
I am using git-shadow
on another small research project to work out the kinks. All is going well so far – the system is completely transparent and unobtrusive, and my spot-checks of the logs look good. Barring any latent bugs, expect some additonal analysis in a future post.
If you decide to give the plugin a shot or have ideas on the subject please leave a comment or drop me a line. Thanks for reading.