Patch review vs diff review, revisited

I’ve been thinking about code review a lot recently.

Code review is a key part of our dev process, and has been from the beginning. From our perspective, code is more than a way of getting things done, it’s a way of expressing your intent. Code that’s easy to read and understand is likely to be more robust, more flexible, and critically, safer. And we care about safety a lot, for obvious reasons.

But the importance of code review doesn’t mean that we’ve always done a good job of organizing it. I’ll talk a bit more about how we used to do code review, how we do it now, and the impact that those changes have had.

The bad old world

Our old code review process was what you might call batch-oriented. We’d prepare a set of changes for a release, and then, after someone gave it a quick look-over, combine these changes together in a branch. We’d then read over these changes very carefully, with multiple people reading each file, making comments, requesting changes, and fixing those changes, until the code was in a releasable state.

This was a big and expensive process, involving many people, and quite a lot of work and coordination. Given the time it took, we focused our code review on our so-called critical path systems, i.e., the ones that are involved in sending orders to the market.

The management task was complex enough that we wrote a tool called cr for managing and tracking the reading of these diffs, parceling out responsibility for different files to different people. We’ve actually blogged about this before, here and here.

Batch-oriented review worked well when we and our codebase were smaller, but it did not scale. By combining multiple changes into a single branch, you were stuck reading a collection of unrelated changes, and the breadth of the changes made fitting it all into your head harder. Even worse, when you throw a bunch of changes together, some are going to take longer than others, so the release is blocked until the slowest change gets beaten into shape.

The end result is that, while we found code review to be indispensable in creating high quality code that we felt comfortable connecting to the markets, the overhead of that review kept on getting higher and higher as we grew.

We needed a better solution.

Feature-based review

Another approach to code review, and a more common one, is patch-based review. In patch review, someone proposes a change to the current release in the form of a patch, and it is the patch itself that is reviewed. Once it passes review, it can be applied to the tree. Patch-based review is great in that it gives you independence: one patch taking a while doesn’t block other patches from getting out.

We avoided patch-based review initially because we were worried about the complexities of dealing with conflicting patches. Indeed, one issue with patch-based review is that the state of the tree when the patch is reviewed is likely not the state of the tree when the patch is applied. Even when this doesn’t lead to textual conflicts, this should leave you a little nervous, since a patch that is correct against one version of the tree is not necessarily correct against a changed tree.

And then, what do you do when there’s a conflict and the patch no longer applies cleanly? You can rebase the patch against the new state of the tree, and then re-review the patch from scratch. But humans are really bad at investing mental energy in boring work, and carefully re-reviewing a patch you’ve already mostly read is deadly dull.

Moreover, when do you decide that there’s a conflict? When dealing with patches that involve file moves and renames, even deciding what it means for a patch written previously to still apply cleanly is a tricky question.

Also, squaring patch-based review with a git or hg-based workflow can be tricky. There’s something quite nice about the github-style pull-request workflow; but the semantics of merging are pretty tricky, and you need to be careful that what you read corresponds with the actual changes that are made by the merge.

For all the problems, the virtues of patch-based review are clear, and so about six months ago we started a project to revamp our cr tool to make it suitable for doing patch-like review. The new version of cr is now organized around what we call features, which are essentially hg bookmarks (similar to git branches) augmented with some associated metadata. This metadata includes

An English description of the change
A base-revision that the changes should be read against
An owner
A collection (usually just one other than the owner) of full-feature reviewers.

The workflow for a developer goes something like this:

create a new feature by running cr feature create. You’ll select a name for the feature and write the initial description. The base-revision will automatically be chosen as the most recent release.
Write the code, using hg in the ordinary way, making commits as you go and pushing the result to a shared, multi-headed repo that has all of the features people are working on.
When you think the code is in a good state, get the feature enabled for review. At this point, you’ll need to get a full-feature reviewer selected. It’s this person’s job to read every change that goes into this feature.
The full feature reviewer then reads the diff from the base-revision to the tip, adding comments, requesting fixes, and reading diffs forward until they’re happy, at which point, it’s seconded.
Once it’s seconded, the feature is enabled for review by anyone who is signed up for review for the specific files you touched. How many file reviewers you are depends on the nature of the project. In our most safety-critical systems, every file has three reviewers. In some other systems, there are no file reviewers at all.

The remaining work needs to be done by the release manager. A release manager can create a new release based on a set of features that:

are fully reviewed, and have no outstanding reviewer complaints to be resolved.
compile cleanly on their own and pass their automated tests
have as their base revision the previous release
can be merged together cleanly

Checking that things “can be merged together cleanly” is actually tricky, since you can’t just trust hg’s notion of a merge. cr has its own merge logic that is more conservative than what hg and git do. The biggest worry with hg is that it tries to guess at a reasonable base-point for the 3-way merge (usually the greatest common ancestor of the two heads to be merged). Usually this works well, but it’s easy to construct crazy cases where on one branch you make changes that are just silently dropped in the merge. There is also some rather surprising behavior that can come into play when files are moved, copied or deleted as part of the mix.

cr, on the other hand, will always choose the base-point of the features to be merged as the base-point for the 3-way merge. This way, the diffs that are reviewed are also the diffs that are used for constructing the merged node. Also, cr has some extra sanity conditions on what merges it’s willing to try. This all greatly reduces the scope for surprising results popping out the other side of a merge.

If the base-revision of a given feature is against an older release, then you need to rebase the review before it can be released, i.e., update the base-revision to the most recent release. Among other things requires you to merge your changes with tip. If there are conflicts, you then either need to review the resolution of the conflicts, or you simply need to reread the diffs from scratch. The last bit is pretty rare, but it’s an important escape hatch.

How’d it go?

The new code-review process has had a dramatic effect on our world. The review process for our main repository used to take anywhere from a couple of weeks to a three months to complete a cycle. Today, those releases go out every week, like clockwork. Everyone knows that if they can get their feature cleaned up and ready, they can always get it out that week. Indeed, if you’re following our open-source releases on github, you’ll see that new packages have shown up once a week for the last 16 weeks.

Feature-baaed review has led to a significant increase in the rate of change of our critical-path systems. Code review is now considerably less painful, and most importantly, it’s easier than ever to say no to a feature that isn’t ready to go. In old-style batch review, there was a lot of pressure to not hold up a release polishing some small bit, which sometimes lead you to release code that wasn’t really ready. Now, that problem has largely vanished.

The barrier to entry for people who want to contribute to critical path systems has also been lowered. This has also contributed to us being able to get projects out the door faster.

But the most striking result I’ve seen is from our post-trade group, which operates outside of the review process used for the critical-path systems. The post-trade team is responsible for our infrastructure that handles everything after a transaction is done, like tracking the clearing and settlement of our trades or managing our trading books and records.

Post-trade has historically had a more relaxed approach to code review – they do it, but not on all parts of the system, and not in a particularly strict way. In the last few months, however, they switched over to using the new feature-based workflow, and even though they’re doing a lot more code review (which takes serious time), their overall process has become faster and more efficient. We think that’s largely do to having a well-managed workflow for managing and merging independent features, even without whatever the benefits of review itself.

I’m now pushing to get feature-based review adopted throughout the firm. Obviously, not all code needs to be scrutinized to the same level – having three reviewers for every file is sometimes sensible, sometimes overkill – but ensuring that no change can get in unless one other person reads it and thinks it’s reasonable is a good baseline rule. Review has a lot of benefits: it improves the quality of the code, gives you a great opportunity for training, and helps spread knowledge. Those benefits make sense everywhere we have people programming.

Maybe the biggest lesson in this for me is the importance of thinking through your processes, focusing on the biggest bottlenecks, and doing what you can to fix them.