As the science of fluid dynamics evolved over the first decades of the twentieth century, it became possible to imagine actually modeling the internal behavior of weather systems, and not just looking at surface resemblances between different configurations. Lewis Fry Richardson proposed a “Weather Prediction by Numerical Process” in a short, equation-heavy book by the same title in 1923, right around the time The Lancet was pondering the merits of a statistical approach to medicine. The problem with Richardson’s proposal—which he himself was all too aware of—lay in the complexity of the calculations: you couldn’t crunch the numbers within the window of the prediction itself. You might be able to build a model that could predict the weather twenty-four hours from now, but it would take you thirty-six hours to actually run the calculations. Richardson sensed that mechanical devices might be invented that would accelerate the process, but his tone was not altogether hopeful: “Perhaps some day in the dim future it will be possible to advance the computations faster than the weather advances and at a cost less than the saving to mankind due to the information gained. But that is a dream.”
Those mechanical devices did arrive, of course, and by the 1970s, national weather bureaus were modeling atmospheric systems with computers, generating forecasts in hours, not days. Today they are still occasionally prone to embarrassing (and sometimes deadly) blind spots. The behavior of hyperlocal weather systems—like tornadoes—are still difficult to map in advance, but it is very rare indeed that a tornado touches ground on a day without a regional warning twenty-four hours in advance. Daily forecasts are now remarkably accurate on an hour-by-hour basis. But the real improvement has been with the long-term forecast. Ten-day forecasts were virtually useless just a generation ago. Looking beyond the window of the next forty-eight hours put you back in Farmers’ Almanac territory. Today, ten-day forecasts far outperform chance, particularly in winter months when the weather systems are larger and thus easier to model. This improvement is not simply a matter of doing more calculations per second. The new weather models are so much more accurate than their predecessors because they rely on an entirely new technique, what is conventionally called ensemble forecasting. Instead of measuring the initial conditions of the current weather and predicting a future sequence of weather events based on a “numerical process” like the one Lewis Fry Richardson proposed, ensemble forecasts create hundreds or thousands of different forecasts, and in each separate simulation the computer alters the initial conditions slightly—lowering the pressure by a few notches here, raising the temperature a few degrees there. If ninety out of a hundred simulations show the hurricane picking up speed and shifting to the northeast, then the meteorologists issue a high-probability forecast that the hurricane is going to pick up speed and shift to the northeast. If only 50 percent of the simulations suggest that pattern, then they issue a forecast with less certainty.
People still make casual jokes about how incompetent meteorologists are, but in fact, thanks to the meta-technique of ensemble forecasting, they have been getting steadily more accurate over the past few decades. A chaotic system like weather may never be fully predictable beyond a few weeks, given the number of variables and the baroque chains of influence that connect them. But our forecasting skills have expanded at a remarkable rate over the past few decades. Weather forecasts are so ubiquitous that we rarely stop to think about them, but the fact is, in this one realm, we can now predict the future with an accuracy that would have astounded our grandparents. In a literal sense, those ensemble simulations, like the randomized controlled experiments of medical research, gave us a new power of clairvoyance. Our ability to predict is no longer just dependent on the what-if scenarios of the default network. We have strategies and technologies that extend our view into the future. The question is, can we apply those tools to other kinds of decisions?
SIMULATIONS
When we make hard choices, we are implicitly making predictions about the course of future events. When we decide to build a park on the outskirts of a growing city, we are predicting that the park will attract regular visitors; that the city itself will expand to envelop the park over the coming years; that replacing the open space with commercial development would, in the long run, prove to be a net negative for the city as greenery becomes increasingly scarce. None of those outcomes are predetermined. They are all predictions, with a meaningful margin of error. So when we see other ways of predicting—medical, say, or meteorological—achieve a positive step-change in accuracy, we should pay attention. Think about Tetlock’s foxes and hedgehogs. The fact that the social forecasters were better served by having a diverse set of interests—and an openness to experience—suggests a lesson that could be directly applied to the domain of personal choices. In Tetlock’s study, the narrowband approach didn’t just make you indistinguishable from the dart-throwing chimp; it made you worse. That alone should give us one valuable lesson: In the hard choice, single-minded focus is overrated.
Think of these three forecasts—the medical predictions of randomized controlled trials, the meteorological predictions of the weather forecasts, and the social predictions of the futurists and the pundits—as three patients, suffering from a chronic form of myopia that keeps them from accurately envisioning future events. The first two patients suffered along with the third for all of human history, until a handful of new ideas converged in the late nineteenth and early twentieth centuries that enabled them to improve their clairvoyant powers in empirically verifiable ways. The time scales were different: the RCTs let us see years, even decades, into the future; the weather forecasts let us see a week. They both hit some kind of threshold point that turned the noise of false prophecy into meaningful signal, but the social forecasters experienced no equivalent step-change. Why?
For all their differences, the RCTs and the weather forecasts share one defining characteristic. They both find wisdom about the question they are wrestling with—Is this medicine going to help treat my illness? Will the hurricane cross over land on Tuesday?—through multiple simulations. In an RCT, those simulations take the form of the hundreds or thousands of other patients with a similar medical condition administered a drug or a placebo. In a weather forecast, the simulations are the hundreds or thousands of atmospheric models generated with an ensemble forecast, each featuring slight variations in the initial conditions. Those patients in the drug trial are not exact replicas of you and your own individual condition, in all its complexity, but they are close enough, and because there are so many of them, the aggregate patterns in the data can tell you something useful about the long-term effects of the drug you are considering taking.
Societal predictions, on the other hand, do not usually have the luxury of consulting alternate realities where the forecast in question—Will the Soviet Union survive the 1990s?—is simulated hundreds of times. This is key to understanding why our medical and meteorological forecasts have gotten so much more accurate, while our societal forecasts have remained so murky. It’s not that social or technological change is more complicated as a system—the Earth’s atmosphere is the very definition of a complex system, after all—it’s that we don’t usually have access to simulations when we talk about future changes to geopolitics or technological inventions.
Ensemble simulations are so powerful, in fact, that you don’t necessarily need to have a complete understanding of how the system works to make useful predictions about its future behavior. When Austin Bradford Hill and his team were experimenting with streptomycin in the late 1940s, they didn’t understand the cell biology that explained why the antibiotic combated tuberculosis in the way modern medicine now does. But the RCT enabled them to develop the treatment regimen anyway, because the data they generated by giving the drug (and a placebo) to hundreds of patients allowed them to see a pattern that was not necessarily visible from giving the drug to just a single patient.
Simulations make us better decision-makers, because simulations make us better at p
redicting future events, even when the system we are trying to model contains thousands or millions of variables. But, of course, it’s much harder to explore small-group decisions through randomized controlled trials or ensemble forecasting. We would get better at predicting the impact pathways of our professional choices if we could run alternate versions of our experience in parallel, and experiment with different choices and outcomes. Rewind the tape and try your career again—only this time you and your partners decide to open a restaurant in a different neighborhood or switch from a restaurant to a boutique. How does that one choice change the future course of your life? Darwin predicted that getting married would reduce his supply of “conversation of clever men at clubs,” but if he’d been able to run multiple simulations of his life—some where he marries Emma, and some where he remains a bachelor—he would have had a better sense of whether that sacrifice actually turned out to be a real one. Simulations make us better at predicting, and successful predictions make us better decision-makers. How, then, can we simulate the personal or collective choices that matter most in our lives?
THE GAME
On the night of April 7, 2011, two stealth Black Hawk helicopters approached a three-story compound, ringed by concrete walls and barbed wire. In the cover of darkness, one Black Hawk hovered over the roof while a SEAL Team 6 unit descended via rope to the roof of the structure. Another chopper deposited a different unit in the courtyard. Minutes later, the units ascended back into the choppers and disappeared into the night.
No guns were fired over the course of this operation, and no terrorist masterminds were captured, because the compound in question did not lie on the outskirts of Abbottabad but rather on the grounds of Fort Bragg, North Carolina. As President Obama contemplated his four options for attacking the Pakistan compound, the Special Ops team, led by General William McRaven, had begun simulating the proposed helicopter raid. The tabletop scale model of the compound had been replaced by a real-life structure built to the exact dimensions of the Abbottabad building and grounds. If there was something in the structure of the compound that made a Special Ops raid unmanageable, McRaven wanted to discover it before Obama decided on the ultimate plan of attack.
And yet, even with the architectural details of the re-created compound, the simulation at Fort Bragg couldn’t re-create one critical element of the actual raid: the hot, high-altitude climate of northeastern Pakistan. And so several weeks later, the same team gathered at a base in Nevada, four thousand feet above sea level—almost the exact elevation of the compound. For this exercise, McRaven did not bother to build an entire simulated structure to represent the compound. They just stacked some Conex shipping containers and surrounded them with chain-link fences roughly corresponding to the location of the concrete walls. This simulation was more focused on the helicopters and their performance at that altitude. “On the real mission the helicopters would have to fly ninety minutes before arriving over Abbottabad,” Mark Bowden writes. “They would be flying very low and very fast to avoid Pakistani radar. Mission planners had to test precisely what the choppers could do at that altitude and in the anticipated air temperatures. How much of a load could the choppers carry and still perform? Originally they had thought they might be able to make it there and back without refueling, but the margins were too close. The choppers would have been coming back on fumes. So the refueling area was necessary.”
We expect our military forces to rehearse a dangerous mission before embarking on it. But the simulated raids in North Carolina and Nevada were staged before Obama actually made the decision to use the Black Hawks to attack the compound. The Special Ops forces weren’t simply practicing for an attack; they were simulating the attack in order to better understand what might go wrong once the Black Hawks entered Pakistani airspace. The simulations were a crucial part of the decision process itself. What they were looking for, ultimately, was some unanticipated consequence of staging the raid in that particular situation. The notorious attempt to rescue the Iranian hostages in 1980 had failed in part because the helicopters encountered a severe dust storm called a haboob—common in the Middle East—that crippled one of the helicopters and ultimately forced the mission to be aborted. If McRaven was going to remain an advocate for the SEAL Team 6 option, he wanted to explore all the potential ways the mission could go wrong.
“One thing a person cannot do, no matter how rigorous his analysis or heroic his imagination,” the Nobel laureate Thomas Schelling once observed, “is to draw up a list of things that would never occur to him.” And yet hard choices usually require us to make those kinds of imaginative leaps: to discover new possibilities that had not been visible to us when we first started wrestling with the decision; to find our way, somehow, to the unknown unknowns lurking outside our field of vision. A brilliant economist and foreign policy analyst, Schelling had a capacity for “rigorous analysis” rivaled by few. But in his years working with the RAND Corporation in the late 1950s and 1960s, he became an advocate for a less rigorous way of thinking around our blind spots: playing games.
The war games designed by Schelling and his RAND colleague Herman Kahn have been well documented by historians and other chroniclers of the period. They led to everything from the controversial theory of Mutually Assured Destruction that governed so much of Cold War military strategy to the creation of the “red phone” hotline between Washington and Moscow to the character of Dr. Strangelove in Stanley Kubrick’s classic film. But the tradition of war-gaming has much older roots. In the first decades of the nineteenth century, a father-and-son team of Prussian military officers created a dice-based game called Kriegsspiel (literally “war game” in German) that simulated military combat. The game resembled a much more complex version of modern games like Risk. Players placed pawns that represented different military units on a map, and the game could accommodate up to ten players working on different teams with a hierarchical command system inside each team. Kriegsspiel even had game mechanisms to account for communications breakdowns between commanders and troops in the field, simulating the “fog of war.” Like the modern-day game Battleship, Kriegsspiel was played on two separate boards, so each side had incomplete knowledge of the other’s actions. A “gamemaster”—an early precedent of the Dungeon Masters that emerged with fantasy games in the 1970s—shuffled back and forth between the two boards, overseeing the play.
Kriegsspiel became an essential part of officer training in the Prussian military. Translated versions of the game made their way into other nations’ armed forces after the string of military victories under Bismarck’s command suggested that the game was giving the Prussians a tactical advantage in battle. It may have played a role in the ultimately disastrous military actions of World War I. The Germans had used Kriegsspiel to simulate invading Holland and Belgium before taking aim at the French. “The game determined that Germany would triumph against France,” the conceptual artist and philosopher Jonathon Keats writes, “as long as ammunition could be rapidly replenished. For this purpose, Germany built the world’s first motorized supply battalions, deployed in 1914. And the plan might have worked brilliantly, if the only players had been the German and French armies.” Instead, the game failed to anticipate the extent to which Belgian saboteurs would undermine their own railway system (and thus the German supply chain), and it had no mechanism for simulating the diplomacy that would eventually bring the United States into the conflict.
The Naval War College in the United States had conducted paper-based war games since its founding in 1884, but in the decade after World War I, the navy took war-gaming to new heights by staging a series of mock conflicts using actual planes and warships (though without bombs and bullets). The exercises—formally designated as “Fleet Problems” followed by a Roman numeral—explored everything from a defense of the Panama Canal to the growing threat of submarine attacks. Fleet Problem XIII, conducted in 1932 over a vast stretch of ocean—from Hawaii to San Diego all the way north t
o the Puget Sound—simulated an aerial attack on US military bases from the Pacific. The exercise made it clear that US forces were vulnerable to a “determined aggressor” to the country’s west, and suggested that six to eight carrier battle groups would be required to mount a proper defense. The advice was ignored, in large part because of the budgetary constraints of the Great Depression. But the prediction would turn out to be tragically accurate on December 7, 1941. If the US military had successfully applied the lesson of Fleet Problem XIII, it is entirely possible that the Japanese attack on Pearl Harbor would have failed—or would have never been attempted in the first place.
Not all war games were perfect crystal balls. But as a mental exercise, they functioned in much the same way as randomized controlled trials or ensemble weather forecasts. They created a platform where decisions could be rehearsed multiple times, using different strategies with each round. The collaborative nature of game play—even if you were playing a zero-sum, competitive game—meant that new possibilities and configurations could become visible to you thanks to unexpected moves that your opponent put on the table. War games began with a map—one of the key innovations that made Kriegsspiel different from metaphoric military games like chess is that it used an actual topographic map of the field of battle—but the real revelation of the game arose from how it forced you to explore that map, to simulate all the different ways the opposing armies might do battle in that space. In Schelling’s language, you can’t draw up a list of things that will never occur to you. But you can play your way into that kind of list. If Kriegsspiel had been invented—and popularized—a century earlier, it’s not hard to imagine Washington successfully anticipating the British attack through Jamaica Pass. The simulations of the war game might well have made up for the local intelligence he lost when Nathanael Greene fell ill.
Farsighted Page 10