by Nate Silver
THE PENGUIN PRESS
Published by the Penguin Group
Penguin Group (USA) Inc., 375 Hudson Street,
New York, New York 10014, U.S.A.
Penguin Group (Canada), 90 Eglinton Avenue East, Suite 700, Toronto, Ontario, Canada M4P 2Y3 (a division of Pearson Penguin Canada Inc.) • Penguin Books Ltd, 80 Strand, London WC2R 0RL, England • Penguin Ireland, 25 St. Stephen’s Green, Dublin 2, Ireland (a division of Penguin Books Ltd) • Penguin Books Australia Ltd, 250 Camberwell Road, Camberwell, Victoria 3124, Australia (a division of Pearson Australia Group Pty Ltd) • Penguin Books India Pvt Ltd, 11 Community Centre, Panchsheel Park, New Delhi – 110 017, India • Penguin Group (NZ), 67 Apollo Drive, Rosedale, Auckland 0632, New Zealand (a division of Pearson New Zealand Ltd) • Penguin Books (South Africa) (Pty) Ltd, 24 Sturdee Avenue, Rosebank, Johannesburg 2196, South Africa
Penguin Books Ltd, Registered Offices:
80 Strand, London WC2R 0RL, England
First published in 2012 by The Penguin Press,
a member of Penguin Group (USA) Inc.
Copyright © Nate Silver, 2012
All rights reserved
Illustration credits
Figure 4-2: Courtesy of Dr. Tim Parker, University of Oxford
Figure 7-1: From “1918 Influenza: The Mother of All Pandemics” by Jeffery Taubenberger and David Morens, Emerging Infectious Disease Journal, vol. 12, no. 1, January 2006, Centers for Disease Control and Prevention
Figures 9-2, 9-3A, 9-3C, 9-4, 9-5, 9-6 and 9-7: By Cburnett, Wikimedia Commons
Figure 12-2: Courtesy of Dr. J. Scott Armstrong, The Wharton School, University of Pennsylvania
LIBRARY OF CONGRESS CATALOGING IN PUBLICATION DATA
Silver, Nate.
The signal and the noise : why most predictions fail but some don’t / Nate Silver.
p. cm.
Includes bibliographical references and index.
ISBN 978-1-101-59595-4
1. Forecasting. 2. Forecasting—Methodology. 3. Forecasting—History. 4. Bayesian statistical decision theory. 5. Knowledge, Theory of. I. Title.
CB158.S54 2012
519.5'42—dc23 2012027308
While the author has made every effort to provide accurate telephone numbers, Internet addresses, and other contact information at the time of publication, neither the publisher nor the author assumes any responsibility for errors, or for changes that occur after publication. Further, publisher does not have any control over and does not assume any responsibility for author or third-party Web sites or their content.
No part of this book may be reproduced, scanned, or distributed in any printed or electronic form without permission. Please do not participate in or encourage piracy of copyrighted materials in violation of the author’s rights. Purchase only authorized editions.
To Mom and Dad
CONTENTS
Title Page
Copyright
Dedication
Introduction
1. A CATASTROPHIC FAILURE OF PREDICTION
2. ARE YOU SMARTER THAN A TELEVISION PUNDIT?
3. ALL I CARE ABOUT IS W’S AND L’S
4. FOR YEARS YOU’VE BEEN TELLING US THAT RAIN IS GREEN
5. DESPERATELY SEEKING SIGNAL
6. HOW TO DROWN IN THREE FEET OF WATER
7. ROLE MODELS
8. LESS AND LESS AND LESS WRONG
9. RAGE AGAINST THE MACHINES
10. THE POKER BUBBLE
11. IF YOU CAN’T BEAT ’EM . . .
12. A CLIMATE OF HEALTHY SKEPTICISM
13. WHAT YOU DON’T KNOW CAN HURT YOU
Conclusion
Acknowledgments
Notes
Index
INTRODUCTION
This is a book about information, technology, and scientific progress. This is a book about competition, free markets, and the evolution of ideas. This is a book about the things that make us smarter than any computer, and a book about human error. This is a book about how we learn, one step at a time, to come to knowledge of the objective world, and why we sometimes take a step back.
This is a book about prediction, which sits at the intersection of all these things. It is a study of why some predictions succeed and why some fail. My hope is that we might gain a little more insight into planning our futures and become a little less likely to repeat our mistakes.
More Information, More Problems
The original revolution in information technology came not with the microchip, but with the printing press. Johannes Gutenberg’s invention in 1440 made information available to the masses, and the explosion of ideas it produced had unintended consequences and unpredictable effects. It was a spark for the Industrial Revolution in 1775,1 a tipping point in which civilization suddenly went from having made almost no scientific or economic progress for most of its existence to the exponential rates of growth and change that are familiar to us today. It set in motion the events that would produce the European Enlightenment and the founding of the American Republic.
But the printing press would first produce something else: hundreds of years of holy war. As mankind came to believe it could predict its fate and choose its destiny, the bloodiest epoch in human history followed.2
Books had existed prior to Gutenberg, but they were not widely written and they were not widely read. Instead, they were luxury items for the nobility, produced one copy at a time by scribes.3 The going rate for reproducing a single manuscript was about one florin (a gold coin worth about $200 in today’s dollars) per five pages,4 so a book like the one you’re reading now would cost around $20,000. It would probably also come with a litany of transcription errors, since it would be a copy of a copy of a copy, the mistakes having multiplied and mutated through each generation.
This made the accumulation of knowledge extremely difficult. It required heroic effort to prevent the volume of recorded knowledge from actually decreasing, since the books might decay faster than they could be reproduced. Various editions of the Bible survived, along with a small number of canonical texts, like from Plato and Aristotle. But an untold amount of wisdom was lost to the ages,5 and there was little incentive to record more of it to the page.
The pursuit of knowledge seemed inherently futile, if not altogether vain. If today we feel a sense of impermanence because things are changing so rapidly, impermanence was a far more literal concern for the generations before us. There was “nothing new under the sun,” as the beautiful Bible verses in Ecclesiastes put it—not so much because everything had been discovered but because everything would be forgotten.6
The printing press changed that, and did so permanently and profoundly. Almost overnight, the cost of producing a book decreased by about three hundred times,7 so a book that might have cost $20,000 in today’s dollars instead cost $70. Printing presses spread very rapidly throughout Europe; from Gutenberg’s Germany to Rome, Seville, Paris, and Basel by 1470, and then to almost all other major European cities within another ten years.8 The number of books being produced grew exponentially, increasing by about thirty times in the first century after the printing press was invented.9 The store of human knowledge had begun to accumulate, and rapidly.
FIGURE I-1: EUROPEAN BOOK PRODUCTION
As was the case during the early days of the World Wide Web, however, the quality of the information was highly varied. While the printing press paid almost immediate dividends in the production of higher quality maps,10 the bestseller list soon came to be dominated by heretical religious texts and pseudoscientific ones.11 Errors could now be mass-produced, like in the so-called Wicked Bible, which committed the most unfortunate typo in history to the page: thou shalt commit adultery.12 Meanwhile, exposure to so m
any new ideas was producing mass confusion. The amount of information was increasing much more rapidly than our understanding of what to do with it, or our ability to differentiate the useful information from the mistruths.13 Paradoxically, the result of having so much more shared knowledge was increasing isolation along national and religious lines. The instinctual shortcut that we take when we have “too much information” is to engage with it selectively, picking out the parts we like and ignoring the remainder, making allies with those who have made the same choices and enemies of the rest.
The most enthusiastic early customers of the printing press were those who used it to evangelize. Martin Luther’s Ninety-five Theses were not that radical; similar sentiments had been debated many times over. What was revolutionary, as Elizabeth Eisenstein writes, is that Luther’s theses “did not stay tacked to the church door.”14 Instead, they were reproduced at least three hundred thousand times by Gutenberg’s printing press15—a runaway hit even by modern standards.
The schism that Luther’s Protestant Reformation produced soon plunged Europe into war. From 1524 to 1648, there was the German Peasants’ War, the Schmalkaldic War, the Eighty Years’ War, the Thirty Years’ War, the French Wars of Religion, the Irish Confederate Wars, the Scottish Civil War, and the English Civil War—many of them raging simultaneously. This is not to neglect the Spanish Inquisition, which began in 1480, or the War of the Holy League from 1508 to 1516, although those had less to do with the spread of Protestantism. The Thirty Years’ War alone killed one-third of Germany’s population,16 and the seventeenth century was possibly the bloodiest ever, with the early twentieth staking the main rival claim.17
But somehow in the midst of this, the printing press was starting to produce scientific and literary progress. Galileo was sharing his (censored) ideas, and Shakespeare was producing his plays.
Shakespeare’s plays often turn on the idea of fate, as much drama does. What makes them so tragic is the gap between what his characters might like to accomplish and what fate provides to them. The idea of controlling one’s fate seemed to have become part of the human consciousness by Shakespeare’s time—but not yet the competencies to achieve that end. Instead, those who tested fate usually wound up dead.18
These themes are explored most vividly in The Tragedy of Julius Caesar. Throughout the first half of the play Caesar receives all sorts of apparent warning signs—what he calls predictions19 (“beware the ides of March”)—that his coronation could turn into a slaughter. Caesar of course ignores these signs, quite proudly insisting that they point to someone else’s death—or otherwise reading the evidence selectively. Then Caesar is assassinated.
“[But] men may construe things after their fashion / Clean from the purpose of the things themselves,” Shakespeare warns us through the voice of Cicero—good advice for anyone seeking to pluck through their newfound wealth of information. It was hard to tell the signal from the noise. The story the data tells us is often the one we’d like to hear, and we usually make sure that it has a happy ending.
And yet if The Tragedy of Julius Caesar turned on an ancient idea of prediction—associating it with fatalism, fortune-telling, and superstition—it also introduced a more modern and altogether more radical idea: that we might interpret these signs so as to gain an advantage from them. “Men at some time are masters of their fates,” says Cassius, hoping to persuade Brutus to partake in the conspiracy against Caesar.
The idea of man as master of his fate was gaining currency. The words predict and forecast are largely used interchangeably today, but in Shakespeare’s time, they meant different things. A prediction was what the soothsayer told you; a forecast was something more like Cassius’s idea.
The term forecast came from English’s Germanic roots,20 unlike predict, which is from Latin.21 Forecasting reflected the new Protestant worldliness rather than the otherworldliness of the Holy Roman Empire. Making a forecast typically implied planning under conditions of uncertainty. It suggested having prudence, wisdom, and industriousness, more like the way we now use the word foresight. 22
The theological implications of this idea are complicated.23 But they were less so for those hoping to make a gainful existence in the terrestrial world. These qualities were strongly associated with the Protestant work ethic, which Max Weber saw as bringing about capitalism and the Industrial Revolution.24 This notion of forecasting was very much tied in to the notion of progress. All that information in all those books ought to have helped us to plan our lives and profitably predict the world’s course.
• • •
The Protestants who ushered in centuries of holy war were learning how to use their accumulated knowledge to change society. The Industrial Revolution largely began in Protestant countries and largely in those with a free press, where both religious and scientific ideas could flow without fear of censorship.25
The importance of the Industrial Revolution is hard to overstate. Throughout essentially all of human history, economic growth had proceeded at a rate of perhaps 0.1 percent per year, enough to allow for a very gradual increase in population, but not any growth in per capita living standards.26 And then, suddenly, there was progress when there had been none. Economic growth began to zoom upward much faster than the growth rate of the population, as it has continued to do through to the present day, the occasional global financial meltdown notwithstanding.27
FIGURE I-2: GLOBAL PER CAPITA GDP, 1000–2010
The explosion of information produced by the printing press had done us a world of good, it turned out. It had just taken 330 years—and millions dead in battlefields around Europe—for those advantages to take hold.
The Productivity Paradox
We face danger whenever information growth outpaces our understanding of how to process it. The last forty years of human history imply that it can still take a long time to translate information into useful knowledge, and that if we are not careful, we may take a step back in the meantime.
The term “information age” is not particularly new. It started to come into more widespread use in the late 1970s. The related term “computer age” was used earlier still, starting in about 1970.28 It was at around this time that computers began to be used more commonly in laboratories and academic settings, even if they had not yet become common as home appliances. This time it did not take three hundred years before the growth in information technology began to produce tangible benefits to human society. But it did take fifteen to twenty.
The 1970s were the high point for “vast amounts of theory applied to extremely small amounts of data,” as Paul Krugman put it to me. We had begun to use computers to produce models of the world, but it took us some time to recognize how crude and assumption laden they were, and that the precision that computers were capable of was no substitute for predictive accuracy. In fields ranging from economics to epidemiology, this was an era in which bold predictions were made, and equally often failed. In 1971, for instance, it was claimed that we would be able to predict earthquakes within a decade,29 a problem that we are no closer to solving forty years later.
Instead, the computer boom of the 1970s and 1980s produced a temporary decline in economic and scientific productivity. Economists termed this the productivity paradox. “You can see the computer age everywhere but in the productivity statistics,” wrote the economist Robert Solow in 1987.30 The United States experienced four distinct recessions between 1969 and 1982.31 The late 1980s were a stronger period for our economy, but less so for countries elsewhere in the world.
Scientific progress is harder to measure than economic progress.32 But one mark of it is the number of patents produced, especially relative to the investment in research and development. If it has become cheaper to produce a new invention, this suggests that we are using our information wisely and are forging it into knowledge. If it is becoming more expensive, this suggests that we are seeing signals in the noise and wasting our time on false leads.
In the 1960s the Un
ited States spent about $1.5 million (adjusted for inflation33) per patent application34 by an American inventor. That figure rose rather than fell at the dawn of the information age, however, doubling to a peak of about $3 million in 1986.35
FIGURE I-3: RESEARCH AND DEVELOPMENT EXPENDITURES PER PATENT APPLICATION
As we came to more realistic views of what that new technology could accomplish for us, our research productivity began to improve again in the 1990s. We wandered up fewer blind alleys; computers began to improve our everyday lives and help our economy. Stories of prediction are often those of long-term progress but short-term regress. Many things that seem predictable over the long run foil our best-laid plans in the meanwhile.
The Promise and Pitfalls of “Big Data”
The fashionable term now is “Big Data.” IBM estimates that we are generating 2.5 quintillion bytes of data each day, more than 90 percent of which was created in the last two years.36
This exponential growth in information is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson, the editor of Wired magazine, wrote in 2008 that the sheer volume of data would obviate the need for theory, and even the scientific method.37
This is an emphatically pro-science and pro-technology book, and I think of it as a very optimistic one. But it argues that these views are badly mistaken. The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning. Like Caesar, we may construe them in self-serving ways that are detached from their objective reality.
Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.
This attitude might seem surprising if you know my background. I have a reputation for working with data and statistics and using them to make successful predictions. In 2003, bored at a consulting job, I designed a system called PECOTA, which sought to predict the statistics of Major League Baseball players. It contained a number of innovations—its forecasts were probabilistic, for instance, outlining a range of possible outcomes for each player—and we found that it outperformed competing systems when we compared their results. In 2008, I founded the Web site FiveThirtyEight, which sought to forecast the upcoming election. The FiveThirtyEight forecasts correctly predicted the winner of the presidential contest in forty-nine of fifty states as well as the winner of all thirty-five U.S. Senate races.