Code review that isn’t boring

At Jane Street, we care a lot about code review. We think that high quality code, and in particular, readable code, helps us maintain the safety of our systems and keeps things simple and clean enough for us to stay nimble.

But code review is hard to do without some kind of automation. We have an existing tool called cr that we’ve used for a long time. I posted a description of the initial design (here and here) back in 2009, and posted an update last year on how we’d revised our approach to review since then.

But cr is showing its age. It has significant problems, both in terms of the workflow it implements, and in terms of performance. cr itself is, ironically, a fairly messy piece of code, so rather than fix it, we’ve been working for the last few months on a brand new replacement called Iron, or fe for short. We’ve just started rolling Iron out, and it’s a big improvement, and represents a significant step up in our understanding of how to organize review. It isn’t released publicly yet, but we will open source it after it stabilizes a bit more.

This is the first in what I expect to be a series of posts about Iron. In this one, I’d going to try to explain why there’s a problem to be solved in the first place, and I’ll do that by walking through some examples that should underline why this is a tricky problem.

Scenario 1: Simple patch review

One simple approach to code review is patch review. Here’s how it might work in a simple case.

Bob develops a new feature
He creates a patch, and shows the patch to Alice.
Alice reviews the patch, and if it looks good, she approves it.
The patch is accepted, and applied to the tree, making a new revision.

That’s simple enough, but what happens if Bob makes some changes in response to Alices’ review? How does Alice become confident in those changes? One common answer is that she should simply read the new patch from scratch.

But reading the new patch in its entirety violates one of our design principles for Iron: don’t be boring. Humans are bad at carefully thinking about boring things, and rereading a patch that hasn’t changed much since you saw it last is incredibly boring.

There’s a simple workaround here. Rather than reread the whole patch, Alice can just reread the bits that changed. Here’s a simple picture to help illustrate:

F2
|
F1
|
B

Here, B is the state of the code when Bob created his feature, or the base revision of his feature. F1 is the state of the feature when he handed the patch, B->F1, to Alice. And F2 is the state of the feature after he responded to Alice’s request.

With this in mind, it’s easy to see that Alice has the option of reading F1->F2, which just contains Bob’s fixes rather than the B->F2, which has both Bob’s original changes plus his fixes.

Scenario 2: Concurrent patch review

Patch review gets a bit more complicated when more people get involved.

Consider what happens if Carla has her own feature, and let’s say she managed to get it released before Bob did. That means that Bob will need to release his patch applied to a different version of the code than the one he based his feature on. Here’s a picture to illustrate the situation.

Fc     Fb
 \    /
  \  /
   B1

Here Fc is Carla’s feature, and Fb is Bob’s. Given that Fc has been released, Bob needs to make his feature a descendent of Fc in order to release it too.

If the diff B1->Fb applies cleanly to Fc, then that’s easy enough: we can just apply Bob’s patch to Carla’s released feature and call it a day. This is effectively a way for Bob to merge his patch with Carla’s.

  Fb'
  / \
 /   \
Fc    Fb
 \   /
  \ /
   B

This raises some natural concerns. For one thing, Alice didn’t read Bob’s patch in a vacuum: she presumably read it in the context of the state of the codebase as it was when the patch B1->Fb was created. Now that the patch is being applied to Fc, how does Alice know that it still makes sense? The fact that there are no textual conflicts is no guarantee.

The answer is that Alice doesn’t know for sure. The only principled way around this would be for Alice to take a lock on the repository as she reads Bob’s patch, and not let anything (including Carla’s patch) jump ahead.

But this way madness lies. Development at scale require parallelism, and so you need to allow concurrent review. In the end, all we can do is to try to cover for this in other ways, in particular by using tests and types to make accidental breaking of the code from such a sitution less likely.

Scenario 3: Conflicted patch review

What if Bob’s patch doesn’t apply cleanly to Carla’s release? In this case, Bob needs to create a new revision that merges together Carla’s changes with his own, say, by applying the patch B->Fb to Fc, and resolving the conflicts by hand. Similarly, Bob could use the merge algorithm from his version control system to do the merge, but again conflicts must be resolved by hand.

  Fb'
  / \
 /   \
Fc    Fb
 \   /
  \ /
   B

Given that Alice has already read B->Fb, how does she become comfortable with Fc->Fb'?

Should could read the diff Fc->Fb', but that would violates our boredom principle, since Fc->Fb' will generally look a lot like B->Fb. Reading the diff Fb->Fb' is no better, since this will largely be a recapitulation of the changes made by Carla in B->Fc.

One way out of this boredom trap is to read the diff-of-diffs, or ddiffs. In other words, Alice can read (B->Fb)->(Fc->Fb'). By reading this, she can get a view on how the feature changed.

Now, diffs-of-diffs can be hard to read, and you only want to read them when you absolutely have to. That’s why we wrote a tool called patdiff4 for use with Iron. patdiff4 does a hunk-by-hunk analysis of the code in the four revisions of the merge diamond, and tries to present each hunk in the simplest way possible, sometimes using ddiffs, but using other simpler visualizations where possible.

Rebasing review

Dealing with review of conflicted merges is at the heart of Iron. In Iron, we refer to this as a rebase, but not because it’s a rebase of the kind you see in git or mercurial. After all, from the point of view of the version control system, you might be doing a merge rather than a rebase. But from Iron’s perspective, the goal is to rebase the review. Thus, after a rebase, Iron thinks of Alice as having approved of Fc->Fb' even though she read B->Fb and then later read the ddiff (B->Fb)->(Fc->Fb').

Rebasing isn’t the end of the Iron story. In future posts, I’ll discuss the two other main ideas that animate Iron: Iron’s management of scrutiny, which is how it keeps track of the level of scrutiny that a user is supposed to apply to a given review; and hierarchical features, a tool for handling complex workflows, including dealing with features that depend on each other.