How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843)
Page 18
1.5% chance that BLACK is correct
86.5% chance that FAIR is correct
12% chance that RED is correct.
The extent to which you believe in RED has more than doubled, while your belief in BLACK has been almost totally wiped out. As is appropriate! You see five reds in a row, why shouldn’t you start to suspect a little more seriously than before that the game is rigged?
That “dividing everything by 0.0325” step might seem a bit of an ad hoc trick. But it’s really the correct thing to do. In case your intuition doesn’t swallow it right away, here’s another picture some people like better. Imagine there are ten thousand roulette wheels. And there are ten thousand rooms, each with a different roulette wheel, each roulette wheel with a person playing it. One of those people, following one of those wheels, is you. But you don’t know which wheel you’ve got! So your state of unknowledge about the wheel’s true nature can be modeled by supposing that, of the original ten thousand, five hundred were biased toward the black, five hundred were biased toward the red, and nine thousand were fair.
The computation we just did above tells you to expect about 281 of the FAIR wheels, about 39 of the RED wheels, and only 5 of the BLACK wheels to come up RRRRR. So if you do get RRRRR, you still don’t know which of the ten thousand rooms you’re in, but you’ve narrowed it down a hell of a lot; you’re in one of the 325 rooms where the ball landed on red five times in a row. And of those rooms, 281 of them (about 86.5%) have FAIR wheels, 39 (12%) have RED wheels, and only 5 (1.5%) have BLACK wheels.
The more balls that fall red, the more favorably you’re going to look on that RED theory (and the less credence you’ll give to BLACK). If you saw ten reds in a row instead of five, the same computation would raise your estimation of the chance of RED to 25%.
What we’ve done is to compute how our degrees of belief in the various theories ought to change once we see five reds in a row—what are known as the posterior probabilities. Just as the prior describes your beliefs before you see the evidence, the posterior describes your beliefs afterward. What we’re doing here is called Bayesian inference, because the passage from prior to posterior rests on an old formula in probability called Bayes’s Theorem. That theorem is a short algebraic expression and I could write it down for you right here and now. But I’m going to try not doing that. Because sometimes a formula, if you train yourself to apply it mechanically without thinking about the situation in front of you, can obscure what’s really going on. And everything you need to know about what’s going on here can already be seen in the box.*
—
The posterior is affected by the evidence you encounter, but also by your prior. The cynic, who started out with a prior that assigned probability 1/3 to each of BLACK, FAIR, and RED, would respond to five reds in a row with a posterior judgment that RED had a 65% chance of being correct. The trusting soul who starts out assigning only 1% probability to the RED will still only give it a 2.5% chance of being right, even after seeing five reds in a row.
In the Bayesian framework, how much you believe something after you see the evidence depends not just on what the evidence shows, but on how much you believed it to begin with.
That may seem troubling. Isn’t science supposed to be objective? You’d like to say that your beliefs are based on evidence alone, not on some prior preconceptions you walked in the door with. But let’s face it—no one actually forms their beliefs this way. If an experiment provided statistically significant evidence that a new tweak of an existing drug slowed the growth of certain kinds of cancer, you’d probably be pretty confident the new drug was actually effective. But if you got the exact same results by putting patients inside a plastic replica of Stonehenge, would you grudgingly accept that the ancient formations were actually focusing vibrational earth energy on the body and stunning the tumors? You would not, because that’s nutty. You’d think Stonehenge probably got lucky. You have different priors about those two theories, and as a result you interpret the evidence differently, despite it being numerically the same.
It’s just the same with Facebook’s terrorist-finding algorithm and the next-door neighbor. The neighbor’s presence on the list really does offer some evidence that he’s a potential terrorist. But your prior for that hypothesis ought to be very small, because most people aren’t terrorists. So, despite the evidence, your posterior probability remains small as well, and you don’t—or at least shouldn’t—worry.
Relying purely on null hypothesis significance testing is a deeply non-Bayesian thing to do—strictly speaking, it asks us to treat the cancer drug and the plastic Stonehenge with exactly the same respect. Is that a blow to Fisher’s view of statistics? On the contrary. When Fisher says that “no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas,” he is saying exactly that scientific inference can’t, or at least shouldn’t, be carried out purely mechanically; our preexisting ideas and beliefs must always be allowed to play a part.
Not that Fisher was a Bayesian statistician. That phrase, nowadays, refers to a cluster of practices and ideologies in statistics, once unfashionable but now rather mainstream, which includes a general sympathy toward arguments based on Bayes’s Theorem, but is not simply a matter of taking both previous beliefs and new evidence into account. Bayesianism tends to be most popular in genres of inference, like teaching machines to learn from large-scale human input, that are poorly suited to the yes-or-no questions Fisher’s approach was set up to adjudicate. In fact, Bayesian statisticians often don’t think about the null hypothesis at all; rather than asking “Does this new drug have any effect?” they might be more interested in a best guess for a predictive model governing the drug’s effects in various doses on various populations. And when they do talk about hypotheses, they’re relatively at ease with talking about the probability that a hypothesis—say, that the new drug works better than the existing one—is true. Fisher was not. In his view, the language of probability was appropriately used only in a context where some actual chance process is taking place.
At this point, we’ve arrived at the shore of a great sea of philosophical difficulty, into which we’ll dip one or two toes, max.
First of all: when we call Bayes’s Theorem a theorem it suggests we are discussing incontrovertible truths, certified by mathematical proof. That’s both true and not. It comes down to the difficult question of what we mean when we say “probability.” When we say that there’s a 5% chance that RED is true, we might mean that there actually is some vast global population of roulette wheels, of which exactly one in twenty is biased to fall red 3/5 of the time, and that any given roulette wheel we encounter is randomly picked from the roulette wheel multitude. If that’s what we mean, then Bayes’s Theorem is a plain fact, akin to the Law of Large Numbers we saw in the last chapter; it says that, in the long run, under the conditions we set up in the example, 12% of the roulette wheels that come up RRRRR are going to be of the red-favoring kind.
But this isn’t actually what we’re talking about. When we say that there’s a 5% chance that RED is true, we are making a statement not about the global distribution of biased roulette wheels (how could we know?) but rather about our own mental state. Five percent is the degree to which we believe that a roulette wheel we encounter is weighted toward the red.
This is the point at which Fisher totally got off the bus, by the way. He wrote an unsparing pan of John Maynard Keynes’s Treatise on Probability, in which probability “measures the ‘degree of rational belief’ to which a proposition is entitled in the light of given evidence.” Fisher’s opinion of this viewpoint is well summarized by his closing lines: “If the views of the last section of Mr. Keynes’s book were accepted as authoritative by mathematical students in this country, they would be turned away, some in disgust, and most
in ignorance, from one of the most promising branches of applied mathematics.”
For those who are willing to adopt the view of probability as degree of belief, Bayes’s Theorem can be seen not as a mere mathematical equation but as a form of numerically flavored advice. It gives us a rule, which we may choose to follow or not, for how we should update our beliefs about things in the light of new observations. In this new, more general form, it is naturally the subject of much fiercer disputation. There are hard-core Bayesians who think that all our beliefs should be formed by strict Bayesian computations, or at least as strict as our limited cognition can make them; others think of Bayes’s rule as more of a loose qualitative guideline.
The Bayesian outlook is already enough to explain why RBRRB looks random while RRRRR doesn’t, even though both are equally improbable. When we see RRRRR, it strengthens a theory—the theory that the wheel is rigged to land red—to which we’ve already assigned some prior probability. But what about RBRRB? You could imagine someone walking around with an unusually open-minded stance concerning roulette wheels, which assigns some modest probability to the theory that the roulette wheel was fitted with a hidden Rube Goldberg apparatus designed to produce the outcome red, black, red, red, black. Why not? And such a person, observing RBRRB, would find this theory very much bolstered.
But this is not how real people react to the spins of a roulette wheel coming up red, black, red, red, black. We don’t allow ourselves to consider every cockamamie theory we can logically devise. Our priors are not flat, but spiky. We assign a lot of mental weight to a few theories, while others, like the RBRRB theory, get assigned a probability almost indistinguishable from zero. How do we choose our favored theories? We tend to like simpler theories better than more complicated ones, theories that rest on analogies to things we already know about better than theories that posit totally novel phenomena. That may seem like an unfair prejudice, but without some prejudices we would run the risk of walking around in a constant state of astoundedness. Richard Feynman famously captured this state of mind:
You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!
If you’ve ever used America’s most popular sort-of-illegal psychotropic substance, you know what it feels like to have too-flat priors. Every single stimulus that greets you, no matter how ordinary, seems intensely meaningful. Each experience grabs hold of your attention and demands that you take notice. It’s a very interesting mental state to be in. But it’s not conducive to making good inferences.
The Bayesian point of view explains why Feynman wasn’t actually amazed; it’s because he assigns a very low prior probability to the hypothesis that a cosmic force intended him to see the license plate ARW 357 that night. It explains why five reds in a row feels “less random” than RBRRB to us; it’s because the former activates a theory, RED, to which we assign some non-negligible prior probability, and the latter doesn’t. And a number ending in 0 feels less random than a number ending in 7, because the former supports the theory that the number we’re seeing is not a precise count, but an estimate.
This framework also helps unwind some of the conundrums we’ve already encountered. Why are we surprised and a little suspicious when the lottery comes up 4, 21, 23, 34, 39 twice in a row, but not when it comes up 4, 21, 23, 34, 39 one day and 16, 17, 18, 22, 39 the next day, even though both events are equally improbable? Implicitly, you have some kind of theory in the back of your mind, a theory that lottery games are for some reason unusually likely to spit out the same numbers twice in close succession; maybe because you think lottery games are rigged by the proprietors, maybe because you think a cosmic synchronicity-loving force has a thumb on the scale, doesn’t matter. You might not believe in this theory very strongly; maybe in your heart you think there’s a one-in-a-hundred-thousand chance that there really is such a bias in favor of repeated numbers. But that’s much more than the prior you assign the theory that there’s a weird conspiracy in favor of the 4, 21, 23, 34, 39−16, 17, 18, 22, 39 combo. That theory is crazy, and you are not stoned, so you pay it no mind.
If you do happen to find yourself partially believing a crazy theory, don’t worry—probably the evidence you encounter will be inconsistent with it, driving down your degree of belief in the craziness until your beliefs come into line with everyone else’s. Unless, that is, the crazy theory is designed to survive this winnowing process. That’s how conspiracy theories work.
Suppose you learn from a trusted friend that the Boston Marathon bombing was an inside job carried out by the federal government in order to, I don’t know, garner support for NSA wiretapping. Call that theory T. At first, because you trust your friend, maybe you assign that theory a reasonably high probability, say 0.1. But then you encounter other information: police located the suspected perpetrators, the surviving suspect confessed, etc. Each of these pieces of information is pretty unlikely, given T, and each one knocks down your degree of belief in T until you hardly credit it at all.
That’s why your friend isn’t going to give you theory T; he’s going to add to it theory U, which is that the government and the news media are in on the conspiracy together, with the newspapers and cable networks feeding false information to support the story that the attack was carried out by Islamic radicals. The combined theory, T + U, should start out with a smaller prior probability; it is by definition harder to believe than T, because it asks you to swallow both T and another theory at the same time. But as the evidence flows in, which would tend to kill T alone,* the combined theory T + U remains untouched. Dzhokar Tsarnaev convicted? Well, sure, that’s exactly what you’d expect from a federal court—the Justice Department is totally in on it! The theory U acts as a kind of Bayesian coating to T, keeping new evidence from getting to it and dissolving it. This is a property most successful crackpot theories have in common; they’re encased in just enough protective stuff that they’re equally consistent with many possible observations, making them hard to dislodge. They’re like the multi-drug-resistant E. coli of the information ecosystem. In a weird way you have to admire them.
THE CAT IN THE HAT, THE CLEANEST MAN IN SCHOOL, AND THE CREATION OF THE UNIVERSE
When I was in college, I had a friend with entrepreneurial habits who had the idea of making a little extra money at the beginning of the school year by selling T-shirts to first-year students. At that time you could buy a large lot of T-shirts from the screen-printing shop for about four dollars each, while the going rate on campus was ten bucks. It was the early 1990s, and it was fashionable to go to parties wearing a hat modeled after the one worn by the Cat in the Hat.* So my friend got together eight hundred dollars and printed up two hundred shirts with a picture of the Cat in the Hat drinking a mug of beer. These shirts sold fast.
My friend was entrepreneurial, but not that entrepreneurial. In fact, he was kind of lazy. And once he’d sold eighty shirts, making back his initial investment, he started to lose his desire to hang out on the quad all day making sales. So the box of shirts went under his bed.
A week later, laundry day came around. My friend, as I mentioned, was lazy. He really didn’t feel like washing his clothes. And then he remembered that he had a box of clean, brand-new beer-swigging-Cat-in-the-Hat T-shirts under his bed. So that solved the problem of laundry day.
As it turned out, it also solved the problem of the day after laundry day.
And so on.
So here was the irony. Everyone around thought my friend was the dirtiest man in school, because he wore the same T-shirt every single day. But in fact, he was the cleanest man in school, dressed every day in a new-from-the-store, never-worn shirt!
The lesson about inference
: you have to be careful about the universe of theories you consider. Just as there may be more than one solution to a quadratic equation, there may be multiple theories that give rise to the same observation, and if we don’t consider them all, our inferences may lead us badly astray.
This brings us back to the Creator of the Universe.
The most famous argument in favor of a God-made world is the so-called argument by design, which, in its simplest form, simply says, holy cow, just look around you—everything is so complex and amazing, and you think it just glommed together that way by dumb luck and physical law?
Or, phrased more formally, by the liberal theologian William Paley, in his 1802 book Natural Theology; or, Evidences of the Existence and Attributes of the Deity, Collected from the Appearances of Nature:
In crossing a heath, suppose I pitched my foot against a stone, and were asked how the stone came to be there: I might possibly answer that, for any thing I knew to the contrary, it had lain there for ever; nor would it perhaps be very easy to shew the absurdity of this answer. But suppose I had found a watch upon the ground, and it should be inquired how the watch happened to be in that place; I should hardly think of the answer which I had before given,—that, for any thing I knew, the watch might have always been there. . . . The inference, we think, is inevitable, that the watch must have had a maker: that there must have existed, at some time, and at some place or other, an artificer or artificers who formed it for the purpose which we find it actually to answer: who comprehended its construction, and designed its use.
If this is true of a watch, how much more so of a sparrow, or a human eye, or a human brain?
Paley’s book was a tremendous success, going through fifteen editions in fifteen years. Darwin read it closely in college, later saying, “I do not think I hardly ever admired a book more than Paley’s Natural Theology: I could almost formerly have said it by heart.” And updated forms of Paley’s argument form the backbone of the modern intelligent design movement.