Black Box Thinking

Page 20

by Matthew Syed

“At the end of the day, success is about getting in the breakaway [where a group of cyclists ride away from the main pack],” he said. “Let’s not f*** about. Either we are in it or we are not. I know it is difficult. I know how hard it is. But everyone needs to buy into this. All focus on that. That is our goal for today. The rest will look after itself. Don’t let anyone else make it happen; make it happen for yourselves . . . OK, hit it!”

A quiet buzz reverberated around the bus. Brailsford had struck the right note. All eight riders stood up and exchanged glances. They then made their way down the steps to the starting line of the sixteenth stage.

• • •

The previous evening Brailsford had given me a tour of the Team Sky operation. We looked at the trucks, the design of the team bus, and the detailed algorithms that are used to track the performance of each cyclist. It was an opportunity to glimpse behind the curtains of one of the most admired and tightly policed operations in all sport.

The success of Brailsford is legendary. When he joined British track cycling as an adviser in 1997, the team was behind the curve. In 2000 Great Britain won a single Olympic gold medal in the time trial. In 2004, one year after Brailsford was appointed performance director, Britain won two Olympic gold medals. In 2008 they won an astonishing eight gold medals and, at the London Olympics in 2012, repeated the feat.

Meanwhile, something even more remarkable was happening. Track cycling is competitive, but the most prestigious form of the sport is professional road cycling. Britain had never had a winner of the Tour de France since the race was established in 1903. British riders had won individual stages, but nobody had come close to winning the general classification.

But in 2009, even as the British track cycling team was preparing for the London Olympics, Brailsford embarked upon a new challenge. He created a road cycling team, Team Sky, while continuing to oversee the track team. On the day the new outfit was announced to the world, Brailsford also announced that they would win the Tour de France within five years.

Most people laughed at this aspiration. One commentator said: “Brailsford has set himself up for an almighty fall.” But in 2012, two years ahead of schedule, Bradley Wiggins became the first-ever British rider to win the event. The following year, Team Sky triumphed again when Chris Froome, another Brit, won the general classification. It was widely acclaimed as one of the most extraordinary feats in British sporting history.

How did it happen? How did Brailsford conquer not one cycling discipline, but two? These were the questions I asked him over dinner at the team’s small hotel after the tour of the facilities.

His answer was clear: “It is about marginal gains,” he said. “The approach comes from the idea that if you break down a big goal into small parts, and then improve on each of them, you will deliver a huge increase when you put them all together.”

It sounds simple, but as a philosophy, marginal gains has become one of the hottest concepts not just in sports, but beyond. It has formed the basis of business conferences, and seminars and has even been debated in the armed forces. Many British sports now employ a director of marginal gains.

But what does this philosophy actually mean in practice? How do you deliver a marginal gains approach, not just in sport, but in other organizations? Most significantly of all, why does breaking a big project into smaller parts help you to tackle really ambitious goals?

To glimpse an answer, let us leave cycling for a moment and look at a very different area of life. For it turns out that the best way to grasp the meaning of marginal gains is to examine one of the most pressing issues facing the world today: global poverty.

II

Take a look at the graph here.1 It is reproduced from the work of Esther Duflo, one of the world’s most respected economists, currently working out of MIT.

The vertical, light-gray bars show the amount of aid spending on Africa over the last thirty years. As you can see, the funding has gradually increased since the early 1960s, peaking at almost $800 million in 2006. The investment has a simple imperative: to improve the lives of the world’s poorest. It is an important objective given that 25,000 children die of preventable causes every day.2

The key question here is, Did the investment make a difference? Did it improve the lives of the people it was designed to help?

A sensible place to start when answering that question is with African GDP. In the diagram African GDP is shown by the solid black line. As you can see, this has stayed roughly constant over the period. This might lead one to the conclusion that all the aid spending hasn’t done much good. It hasn’t boosted economic activity. It hasn’t raised the living standards of those living in Africa. In fact it all seems like an expensive waste of time.

But the insights from the previous chapter should urge a little caution. Why? Because the data don’t give us an insight into the counterfactual. Perhaps the aid spending was incredibly successful. Perhaps, without it, GDP in Africa would have been far lower—the white line in the graph.

Of course, there is another possibility. Perhaps aid spending was even more detrimental than the solid black line might lead you to believe. Perhaps it was a disaster, destroying incentives, boosting corruption, and lowering growth below what it would otherwise have been. Perhaps without it Africa would have actually surged ahead: as per the dotted line in the graph. How can we know either way?

Each of these two alternatives has high-profile supporters. Jeffrey Sachs, director of the Earth Institute at Columbia University, for example, is a vocal advocate of development spending. He argues that aid has benefited the lives of Africans and claims that more money could eradicate poverty altogether. The End of Poverty, his best-selling book, is based in part upon this premise.3

Conversely, William Easterly, an economist at New York University, profoundly disagrees. He argues that aid spending has had all sorts of negative side effects, and that Africa would have been better off without it. His book The White Man’s Burden presents this case with as much intellectual force as that of Sachs.4

The best way to adjudicate between these stances would be to conduct a randomized control trial. This would enable us to isolate the effect of development spending from all the other influences on African GDP. But there is a rather obvious problem. There is only one Africa. You cannot find lots of different Africas, randomly divide them into groups, give aid to some and not to others, and then measure the outcomes.

This may sound like a trivial point, but it has wider implications. When it comes to really big issues, it is very difficult to conduct controlled experiments. To run an RCT you need a control group, which is not easy when the unit of analysis is very large. This applies to many things beyond development aid, such as climate change (there is only one world), issues of war and peace, and the like.

This brings us directly to the concept of marginal gains. If the answer to a big question is difficult to establish, why not break it down into lots of smaller questions? After all, aid spending has many subcomponents. There are programs on malaria, literacy, road-building, education, and infrastructure, each of them constructed in different ways, with different kinds of incentives, and delivered by different organizations.

At this level of magnification, by looking at one program at a time, it is perfectly possible to run controlled experiments. You try out the program with some people or communities, but not with others, and then compare the two groups to see if it is working or not. Instead of debating whether aid is working as a whole (a debate that is very difficult to settle on the basis of observational data), you can find definitive answers at the smaller level and build back up from there.

To examine a concrete example, suppose you were trying to improve educational outcomes in Africa. One way to see if aid spending is working would be to look at the correlation between the quantity of spending and the average grade score across the continent. The problem is that th
is wouldn’t give you any information about the counterfactual (what would have happened to scores without the funding).

But now suppose that instead of looking at the big picture, you examine an individual program. That is precisely what a group of pioneering economists did in the impoverished Busia and Teso regions in the west of Kenya. As the author Tim Harford points out in his book Adapt, these economists wanted to know whether handing out free textbooks to schools would boost grades. Intuitively, they were pretty sure it would. In the past the observational data had been good. Schools that received books tended to improve their test scores.

But the economists wanted to be sure, so they performed an RCT. Instead of giving the textbooks to the most deserving schools, which is the common approach, they randomly divided a number of eligible schools into two groups: one group received free textbooks and the other group did not. Now, the charity had a treatment group and a control group. They had a chance to examine whether the books were making a real difference.

The results, when they came in, were both emphatic and surprising. The students in the schools that received free textbooks didn’t perform any better than those who did not. The test results in the two groups of schools were almost identical. This outcome contradicted intuition and the observational data. But then randomized trials often do.

The problem, it turned out, was not the books, but the language they were written in. English is the third language of most of the poor children living in remote Busia and Teso. They were struggling to grasp the material as it was presented. Researchers might not have realized this had they not run a trial. It pierced through to one of the untested assumptions in their approach.

Confronted with failure, the economists tried another approach. They conducted another randomized trial but instead of using textbooks they used visual aids. These were flipcharts with bold graphics that covered geography, math, etc. Again, the economists expected them to boost test scores. And again, when they compared the test scores in the treatment group with those of the control group, the flipcharts were a failure. They led to no significant improvement in learning.

Undeterred, the economists started to think about the problem in a fresh way. They tried something completely new: a de-worming medication. This may seem like a curious way to improve education, but researchers were aware that these parasites stunt growth, cause children to feel lethargic, and lead to absenteeism. They disproportionately affect children in remote communities, just like those in Busia and Teso.

This time the results were excellent. They vastly exceeded the expectations of the researchers. As Tim Harford put it: “The program was a huge success, boosting children’s height, reducing re-infection rates, and also reducing absenteeism from school by a quarter. And it was cheap.”5

This was a marginal gain. It was just one program in one small region. But by looking at education at this level of magnification, it was possible to see what really works, and what doesn’t. The economists had tested, failed, and learned. They could now roll it out in other areas, while continuing to test, and iterate, and create yet more marginal gains.

This may sound like a gradual way to improve, but look at the alternative. Consider what would have happened if the economists had relied on intuition and observational data. They might have continued with free textbooks forever, deluding themselves that they were making a difference, when they were doing virtually nothing at all.

This approach is now the focus of a crusading group of economists who have transformed international development over the last decade. They do not come up with grand designs; rather, they look for small advantages. As Esther Duflo, the French-born economist who is at the forefront of this approach, put it: “If we don’t know if we are doing any good, we are not any better than the medieval doctors and their leeches. Sometimes the patient gets better; sometimes the patient dies. Is it the leeches or something else? We don’t know.”6

Critics of randomized trials often worry about the morality of “experimenting on people.” Why should one group get X while another is getting Y? Shouldn’t everyone have access to the best possible treatment? Put like this, RCTs may seem unethical. But now think about it in a different way. If you are genuinely unsure which policy is the most effective, it is only by running a trial that you can find out. The alternative is not morally neutral, it simply means that you never learn. In the long run this helps nobody.

Duflo, who is petite and dynamic, doesn’t regard her work as lacking in ambition; rather, she regards these incremental improvements as pioneering. She told me:

It is very easy to sit back and come up with grand theories about how to change the world. But often our intuitions are wrong. The world is too complex to figure everything out from your armchair. The only way to be sure is to go out and test your ideas and programs, and to realize that you will often be wrong. But that is not a bad thing. It leads to progress.

This links back to the work of Toby Ord, whom we met in chapter 7. He uses the data discovered by the likes of Duflo to advise private individuals on where to donate their money. He realized that relying on hunch and narrative can mean that millions of pounds are squandered on ineffective programs. And this is why hundreds of controlled experiments are now being conducted across the developing world. Each test demonstrates whether a policy or program works, or if it doesn’t.

Each test provides a small gain of one kind or another (remember that failure is not inherently bad: it sets the stage for new ideas). By breaking a big problem into smaller parts, it is easier to cut through narrative fallacies. You fail more, but you learn more.

As Duflo puts it: “It is possible to make significant progress against the biggest problem in the world through the accumulation of a set of small steps, each well thought out, carefully tested, and judiciously implemented.”7

III

And this takes us back to David Brailsford and British cycling. Note the similarity of the final quote of Duflo with that of Brailsford earlier in this chapter. “The whole approach comes from the idea that if you break down a big goal into small parts, and then improve on each of them, you will gain a huge increase when you put them all together.”

Cycling is very different from international development, but the success of its most pioneering coach is based on the same conceptual insight. As Brailsford puts it: “I realized early on that having a grand strategy was futile on its own. You also have to look at a smaller level, figure out what is working and what isn’t. Each step may be small, but the aggregation can be huge.”

Running controlled trials in cycling is significantly easier than in development aid, not least because the aim of the sport is relatively simple: getting from A to B as quickly as possible. To obtain the most efficient bicycle design, for example, British cycling created a wind tunnel. This enabled them to isolate the aerodynamic effect, by varying the design of the bike and testing it in identical conditions. To discover the most efficient training methods, Brailsford created new data sets that enabled him to track every subcomponent of physiological performance.

“Each gain on its own was small,” Brailsford said. “But that doesn’t really matter. We were getting a deeper understanding of each aspect of performance. It was the difference between trailing behind the rest of the world and coming first.”

In Corporate Creativity, the authors Alan Robinson and Sam Stern write of how Bob Crandall, the former chairman of American Airlines, removed a single olive from every salad, and in doing so saved $500,000 annually.8 Many seized on this as a marginal gain. But was it? After all, if removing an olive is a good idea, why not the lettuce too? At what point does an exercise in incremental cost-cutting start to impact on the bottom line?

Now we can see a clear answer. Marginal gains is not about making small changes and hoping they fly. Rather, it is about breaking down a big problem into small parts in order to rigorously establish what works and what doesn’t. U
ltimately the approach emerges from a basic property of empirical evidence: to find out if something is working, you must isolate its effect. Controlled experimentation is inherently “marginal” in character.

Brailsford puts it this way: “If you break a performance into its component parts, you can build back up with confidence. Clear feedback is the cornerstone of improvement. Marginal gains, as an approach, is about having the intellectual honesty to see where you are going wrong, and delivering improvements as a result.”

The marginal gains mentality has pervaded the entire Team Sky mindset. They make sure that the cyclists sleep on the same mattress each night to deliver a marginal gain in sleep quality; that the rooms are vacuumed before they arrive at each new hotel, to deliver a marginal gain in reduced infection; that the clothes are washed with skin-friendly detergent, a marginal gain in comfort.

“People think it is exhausting to think about success at such a high level of detail,” Brailsford says. “But it would be far more exhausting, for me anyway, to neglect doing the analysis. I would much rather have clear answers than to delude myself that I have the ‘right’ answers.”

• • •

Perhaps the most astonishing application of marginal gains is to be found not in cycling but in Formula One. In the closing weeks of the 2014 season I visited the Mercedes headquarters in Brackley, a few miles north of Oxford. It is a series of gray buildings on an industrial estate, with a stream running through it. It is populated with bright people, passionate about their sport—and whose attention to detail is staggering.

“When I first started in F1, we recorded eight channels of data. Now we have 16,000 from every single parameter on the car. And we derive another 50,000 channels from that data,” said Paddy Lowe, a Cambridge-educated engineer, who is currently the technical leader of Mercedes F1. “Each channel provides information on a small aspect of performance. It takes us into the detail, but it also enables us to isolate key metrics that help us to improve.”

‹ Prev Next ›