foote.pub

Jonathan Foote, Security Dad

Follow Me On

GitHub Open source code

Twitter Mostly retweets and non-sequitors

 

Posts

Stowing distracting MacOS apps (personal edition)

Stowing distracting Android apps

Other

A definitive answer to "How long did it take you to code that?"


Last week I wrapped up a proof-of-concept unplugged-ness tracker that I developed using vim with git-shadow. In this post I’ll cover some interesting facts about the project’s genesis that I was able to glean from the resulting dataset.

Background on the example projects

Quick git-shadow overview

git-shadow is a combo IDE plugin + git command that “shadows” the user’s coding process into per-commit repositories. For example, if the user can run git shadow log -p myfile.cpp to get a timestamped, line-by-line breakdown of all of the edits they have made to the code since the last commit (and this can be done for any commit git-shadow was active for).

Data flow

The PoC available in the GitHub project works with vim, though as I’ll cover in this post it wasn’t too difficult to make it work with Android Studio as well.

Quick ‘unplug’ overview

The only thing you really need to know about the unplug app to make sense of this post is that it includes an OSX daemon, an Android app, and some webby script to work.

DFD

Making git-shadow work with Android Studio

It didn’t seem wise to use vim to develop an Android app, so I decided to make some tweaks to allow me to use git-shadow with Android Studio. It’s not perfect, but I was able to make things with via two quick hacks:

  1. Make Android Studio dump the active buffer to disk a few seconds after I stop editing it.

Preferences

  1. Run a script in the background that will call git shadow’s internal shadow-file command for each file in my android project. You can see the ~10 lines of python that I used here.

Answering “How long did it take you to code that?” (and more!)

It’s relatively simple to analyze git-shadow data – after all it is just a bunch of git repos. I wrote a quick script that iterates over all the shadow repos for my project, grabs commit data via git-log, and passes any extra arguments I supply on the command line to git-log along the way. I used this technique to slice that in a few interesting ways.

How long did it take to code the whole project?

Letting the script call git log without any arguments results in an analysis that covers the whole project. The script spits out text that slices the time spent coding – that is, an analysis of time spent writing code based on near-real-time shadowing – on the project by day and well as some JSON that can be used with tinychart (note: no specific reason for picking tinychart, it just came up on HN whenever I was working on this :). Here is the text result:

2015-01-26 Monday   : 3:47:51
2015-01-27 Tuesday  : 2:36:41
2015-01-28 Wednesday    : 0:00:05
2015-01-29 Thursday     : 0:44:52
2015-01-30 Friday   : 1:39:23
2015-01-31 Saturday     : 0:41:41
2015-02-01 Sunday   : 1:35:24
2015-02-02 Monday   : 1:48:36
2015-02-03 Tuesday  : 1:17:32
2015-02-04 Wednesday    : 0:44:19
2015-02-05 Thursday     : 0:59:37
2015-02-06 Friday   : 1:49:25
2015-02-07 Saturday     : 0:00:05
2015-02-08 Sunday   : 0:00:01
2015-02-09 Monday   : 2:01:21
2015-02-10 Tuesday  : 3:03:38
2015-02-11 Wednesday    : 3:54:37
2015-02-12 Thursday     : 0:25:06
2015-02-13 Friday   : 0:25:05
2015-02-14 Saturday     : 0:38:34
2015-02-15 Sunday   : 0:03:44
2015-02-16 Monday   : 0:07:30
2015-02-20 Friday   : 0:23:44
********
Total time coding (days, H:M:S): 1 day, 4:48:51
Total commits: 2757

And the corresponding tinychart:

Project data

So what this means is I spent 1 day, 4 hours, 48 minutes and 51 seconds actually hacking code for this project. Damn, that seems like a lot for an experiment! Anyway, you can by-day breakout that I burnt the midnight oil on Tuesday the 10th and Wednesday the 11th to get the initial PoC working. If this were more than a simple experiment I might be a bit worried about the code I wrote those days (after all, these hours were spent after a ~9 hr day of work and ~4 hours of family time : ).

An interesting side-effect of this analysis is that I can also measure the code churn for the project at the per-edit level. The text output above includes the total number of commits (2757) for the project – that means I made 2757 “edits” to the code.

Bonus: How long did it take to code any git-slicable subset of the code?

I mentioned above that the analysis script passes extra arguments along to the git-log incantation – this allows me to do things like…

How long did it take to code the Android app?

$ ./shadow_analysis.py -- android
2015-02-04 Wednesday    : 0:15:03
2015-02-05 Thursday     : 0:34:37
2015-02-06 Friday   : 1:47:06
2015-02-08 Sunday   : 0:00:05
2015-02-09 Monday   : 1:19:09
2015-02-10 Tuesday  : 0:53:31
2015-02-11 Wednesday    : 0:03:48
2015-02-13 Friday   : 0:20:57
2015-02-14 Saturday     : 0:00:10
2015-02-15 Sunday   : 0:03:44
2015-02-16 Monday   : 0:01:37
********
Total time coding (days, H:M:S): 5:19:47
Total commits: 376

Android data

So that means I spent about 5 hours and 20 minutes coding up the android app, with a total churn of about 376.

How long did it take you to code a specific function?!!

Yes, git-shadow can answer this question too. We can get creative with git line-level logging to run the same analysis on a per-function level. Neat! Below I run the command on a function called dump in a Python script named unplug:

$ ./shadow_analysis.py -L :'def dump':unplug
2015-01-30 Friday   : 0:00:20
2015-01-31 Saturday     : 0:00:10
2015-02-01 Sunday   : 0:05:36
2015-02-02 Monday   : 0:00:15
2015-02-03 Tuesday  : 0:02:30
2015-02-04 Wednesday    : 0:03:30
2015-02-05 Thursday     : 0:00:35
2015-02-06 Friday   : 0:00:59
2015-02-08 Sunday   : 0:00:05
2015-02-09 Monday   : 0:03:07
2015-02-10 Tuesday  : 0:00:45
2015-02-11 Wednesday    : 0:01:17
2015-02-13 Friday   : 0:00:05
2015-02-14 Saturday     : 0:00:10
********
Total time coding (days, H:M:S): 0:27:29
Total commits: 73

Line-level data

This is where the churn data might be interesting – it remains to be seen, but calculating churn (or churn divided-by time) for a given code artifact may be a indicator of code quality. For example, when I don’t know how a given framework or language works very well (or I am tired, etc.), I tend to write and re-write things between commits. As was the case with much of unplug : ).

git-shadow, or a tool like it, could provide a lot of insight into how we create the software we rely on every day. I hope you enjoyed reading about it. Thanks!