Scaling code review

Consider two tables. I'm not talking about database tables here; I'm talking about items of furniture. The first table is a flat-pack effort from a popular high-street retailer. The second is an artisan mahogany table with legs turned in a beautiful fluted Sheraton style. Which would you choose? Well, it's clear that both of the items serve the same purpose as a raised surface to place items, but it's obvious that the latter is better: it has better build quality, superior materials, and you'd wager that you would pass it down through many generations to come. The first one will probably become firewood, or end up in a student house.

You could argue, however, that this table maybe had a little too much work

The same can be said about code. It can be cobbled together by novices and it will probably work, but you don't want to be there when it breaks and needs maintaining. You certainly don't want to be there when it needs extending to accommodate a new feature and the only solution is to create Frankenstein's monster by bolting it awkwardly on to the side. Ideally, we want high quality code acting as the scaffolding behind all of our infrastructure, since thousands of users depend on it daily to do their job. We want to be craftsmen at the top of our trade. This post considers how we've been trying to deal with scaling code review on the Java code bases.

We've been constantly trying to improve our code review process. This has been especially challenging in the past few years as the Engineering department has grown from around 20 to 100 staff. We want code bases that can grow and evolve along with the company. We want to avoid having to rewrite things from scratch. We wholeheartedly embrace the pull request model on GitHub, and we'll get into specifics after the fold. But first, let's have a little history lesson.

When I joined the company in 2011 we were using git for source control, so we were off to a good start. We still had a monolithic code base on the backend, and we roughly followed the git flow. We used to do code review in person by sitting down with the developer and having them walk through the code. This was great for social interaction, but not so good at spotting bugs: gnarly bits of work didn't get the attention they deserved and it was hard to see what everything looked like previously in order to get a bigger picture of the changes. We tried incorporating command line diffs into this process, but again, it was hard to stay focussed and not get frustrated and tired. Bugs kept turning up on our integration environment and we needed to do something about it.

"Oh, that's fine - I've tested it."

In late 2012, we began migrating our code to GitHub; some repositories have private visibility on github.com, and others are hosted internally using GitHub Enterprise. From here, we embraced the pull request (PR) model for code changes. The UI was a joy: you could see the entire diff clearly, you could describe what was going on in the PR and you could have a discussion on specific lines. Yet, there were growing pains. What do you do when developers fundamentally disagree on the approach being taken? Who has the authority to decide when a PR is ready for merging?

Who decides on the relevance of PR concerns?

We countered this sometime in early 2013 by forming a group of the most experienced developers and anointing them as "mergers"; those who have the authority to make decisions on when to click the big green "Merge" button on a PR. By mostly organic means this became a group who also conversed regularly on the topics of best practice and common patterns that needed wider encouragement or refactoring out. We prompted developers to assign PRs to the merger they thought most suitable for reviewing it. It worked fairly well at first.

The department grew, and again, we faced more problems with the current arrangement. Some mergers ended up having tens of PRs assigned to them, whereas others received very little. The reasons were varied; some mergers gave more thorough reviews than others, some were a softer touch, some were just more experienced. It didn't scale. By early 2014, some PRs were taking an unacceptably long time to get feedback. A particular event stands out in memory where one of our senior developers went on holiday and managed to accumulate a queue of 20 PRs. There's no such thing as an out of office notification on GitHub. When you have more than one office with developers, you can't easily peer over the monitors and see whether someone is there or not.

The next iteration on the model was to forbid assignment altogether. Instead, we encouraged all developers to spend 15-30 minutes a day reviewing the code of others. Once PRs were mergeable, the reviewers would drop a thumbs-up emoji at the end of the comment thread, and a merger could then sweep through and click the button. This greatly improved PR speed and encouraged other developers to regularly take part in reviews: they felt like they were part of the process of moving more than just their own code from development to production. We noticed a viral effect with best practice also. Developers seeing a comment that recommended a better library or code structure in a PR would then go off and try the same in their own. It had a measurable effect on code quality across the board.

Yet, this model was still not perfect. There still was a lingering feeling of preparing your PR and then submitting it into the great unknown, much like a bottle being cast out to sea. We are now encouraging each team to take ownership of the PRs their team raises and especially advocating early "work-in-progress" PRs to be opened at the beginning of stories with inter-team code review happening throughout. Teams with mergers can usually merge code with little help required from other teams, and those without mergers are at the very least presenting much more considered and mature code to outside developers by the time it's ready to review.

I don't know what the solution is when the department is ten times the size. But we'll always adapt and improve. Setting the bar high for code quality isn't just important for our line managers or the company we work for. It's important for ourselves too as we take small steps towards mastery each day.