Code Shadowing: Recording programming in near-real-time
When I was tinkering with what goes on in one’s my brain while coding I needed a way to record my coding process for later analysis. My neuro tinkering is in hiatus, but the simple, standalone system I developed for logging the programming process in near near-real time (
git-shadow) is available on GitHub.
In this post I’ll explain the tool, borrowing heavily from the README.
What is code shadowing?
Code shadowing is the process of recording a programmer’s coding in near-real-time. As the programmer inputs characters, code shadowing unobtrusively logs those characters with some sort of timestamp for later analysis, or perhaps realtime feedback.
These days I use
git a lot, so that is what
git-shadow currently implements. However, code shadowing could be applied to just about any editor or VCS. In fact, collaborative and cloud-based editors that support operational transformation like firepad.io and nitrous.io are probably a more natural fit for code shadowing.
Note: I generally abhor useless buzzwords, so please believe when I say I’m only using the term code shadowing because I wasn’t able to find a succint, accurate term for the idea in prior art. If you have an idea for a better name for this process (or you have a reference that already labels it) please leave a comment or drop me a line.
Why do it?
Recording the coding process could potentially help answer a bunch of questions that I (and perhaps others) think are interesting about development:
- How does the coding process of someone who is an expert in a language or framework differ from that of someone who is just learning it?
- Can applying churn, ownership, and other fault prediction techniques to real-time coding data help programmers make fewer mistakes?
- Can the data that major internet services collect on us be used to help us understand why we make mistakes in code?
- Can emerging research in using quantified-self data to understand the coding process be applied in the real world?
- Do I actually write code in a different way than the rockstars (Knuth, Linus, et al.)? What about programmers I really admire? What are the differences in our techniques, and how can I get better?
Transparently logging live-coding activity could help answer these and similar questions.
git-shadow is a simple tool that aims to perform this type of logging and enable developers to start analyzing coding data using tools and methods that have already been built around
Say a silly bug is found in my code during an internal code review, or worse…
… and I decide to do some root-cause analysis with the help of
Gratuitous Disclaimer: I had nothing to do with the real
goto fail; bug, I’ve never worked for Apple, and I have no idea how the bug was actually introduced. IOW, it’s just an example.
1. Find the commit where the bug was injected using conventional methods
foote$ git log -S 'goto fail' commit 7dba55fb8590f043afe935a9b366814fa5727804 Author: Jonathan Foote <firstname.lastname@example.org> Date: Mon Jan 23 10:03:49 2014 -0500 Fixed issue #PR59241 commit a4c55a248e8ad381d71466c0a8e3a477dfe5ac60 Author: Steve Jobs <email@example.com> Date: Fri June 11 14:00:55 2003 -0500 Initial commit
I can see from the above pickaxe search that the only commits that added or removed a
goto fail were the initial commit and commit
7dba55f... made by this shady
Jonathan Foote character.
2. Find exact minute/second you made the mistake using
$ git checkout 7dba55fb8590f043afe935a9b366814fa5727804 Note: checking out '7dba55fb8590f043afe935a9b366814fa5727804'. [...] HEAD is now at 7dba55f... Fixed issue #PR59241 flan:demo user0$ git shadow log -S 'goto fail' commit 69136d46fe975e9b239de44d330eaba3d4593665 Author: Jonathan Foote <firstname.lastname@example.org> Date: Fri Jan 20 23:12:54 2014 -0500 'file_modified' commit 38013a4f169e3e8d4c8208d9cf65507559c95f29 Author: Jonathan Foote <email@example.com> Date: Thu Jan 19 14:12:00 2014 -0500 '7dba55fb8590f043afe935a9b366814fa5727804'
The oldest shadow commit discovered above,
38013a4... is the verbatim shadow copy of the code created when I first started working on the
PR59241. According to pickaxe, the only other shadow commit to modify
goto fail was
69136d4... made at
Fri Jan 20 23:12:54 2014 -0500. Looks like I was coding late at night when I made the mistake…
3. Query your big data using the fault injection time to do a root cause analysis
Oh. Yeah. Probably shouldn’t have done any programming that Friday night.
Note: Drunk tweets notwithstanding, querying something like fluxtream could provide some novel insight.
4. I change my habits to avoid making the same mistake again.
git-shadow to continuously improve my programming skills, becoming to envy of all my friends. After a few years of flawless programming, I retire as a rich philantropist.
How it works
git shadow activate is invoked, a mirror of all files that are tracked in the current repo is created in
<repo path>/.shadow. Hooks are added to the current repo to keep the shadow consistent with HEAD.
All of the shadow logic is implemented in a single python script (
git-shadow). The example editor plugin is implemented in two simple vimscript files (
As you code in vim, the
vim-shadow plugin periodically passes the contents of the active buffer to the
git-shadow shadow-file command, which adds them to a shadow git repository inside the
As commits are made to your codebase,
git-shadow catalogues git repositories containing your coding activity in the
.shadow directory by commit id. The
.shadow directory contains a directory for each commit id that
git-shadow has been active for, including
When the user runs a
checkout command, a hook placed in
.git/hooks when the user ran the
git shadow activate command deletes the existing
.shadow/current and replaces it with the directory corresponding to the new
HEAD if it exists.
The hook simply calls an incantation of
git-shadow – all of the hook logic is contained in the
git-shadow script. Note: This logic probably needs some work.
git shadow <git cmd> simply runs the corresponding git command as if it were invoked from
Now to smoke test it…
I am using
git-shadow on another small research project to work out the kinks. All is going well so far – the system is completely transparent and unobtrusive, and my spot-checks of the logs look good. Barring any latent bugs, expect some additonal analysis in a future post.
If you decide to give the plugin a shot or have ideas on the subject please leave a comment or drop me a line. Thanks for reading.