How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843)
Page 35
IT’S NOT ALWAYS WRONG TO BE WRONG
In an alternate universe, one where later research on tobacco came out differently, we might have found that Fisher’s odd-sounding theory was right after all, and smoking was a consequence of cancer instead of the other way around. It wouldn’t be the biggest reversal medical science has ever suffered, by a long shot. And what then? The surgeon general would have issued a press release saying, “Sorry, everyone can go back to smoking now.” In the interim, tobacco companies would have lost a lot of money, and millions of smokers would have forgone billions of pleasurable cigarettes. All because the surgeon general had declared as a fact what was only a strongly supported hypothesis.
But what was the alternative? Imagine what you’d have to do in order to really know, with something like absolute assurance, that smoking causes lung cancer. You’d have to collect a large population of teenagers, select half of them at random, and force that half to spend the next fifty years smoking cigarettes on a regular schedule, while the other half would be required to abstain. Jerry Cornfield, an early pioneer of smoking research, called such an experiment “possible to conceive but impossible to conduct.” Even if such an experiment were logistically possible, it would violate every ethical norm in existence about research on human subjects.
Makers of public policy don’t have the luxury of uncertainty that scientists do. They have to form their best guesses and make decisions on the basis thereof. When the system works—as it unquestionably did, in the case of tobacco—the scientist and the policy maker work in concert, the scientist reckoning how uncertain we ought to be and the policy maker deciding how to act under the uncertainty thus specified.
Sometimes this leads to mistakes. We’ve already encountered the case of hormone replacement therapy, which was long thought to protect postmenopausal women against heart disease, based on observed correlations. Current recommendations, based on randomized experiments performed later, are more or less the opposite.
In 1976 and again in 2009, the U.S. government embarked on massive and expensive vaccination campaigns against the swine flu, having received warnings from epidemiologists each time that the currently prevailing strain was particularly likely to go catastrophically pandemic. In fact, both flus, while severe, fell well short of disastrous.
It’s easy to criticize the policy makers in these scenarios for letting their decision making get ahead of the science. But it’s not that simple. It’s not always wrong to be wrong.
How can this be so? A quick expected value computation, like the ones in part III, helps unpack the seemingly paradoxical slogan. Suppose we’re considering making a health recommendation—say, that people should stop eating eggplant because eggplant induces a small risk of sudden catastrophic heart failure. This conclusion is based on a series of studies that found eggplant eaters slightly more likely than non−eggplant eaters to keel over dead without warning. But there’s no prospect of doing a large-scale randomized controlled trial where we force eggplants on some people and deny them to others. We have to make do with the information we have, which represents a correlation only. For all we know, there’s a common genetic basis for eggplantophilia and cardiac arrest. There’s no way to be sure.
Perhaps we are 75% sure that our conclusion is correct and that a campaign against eggplant would save a thousand American lives per year. But there’s also a 25% chance our conclusion is wrong; and if it’s wrong, we’ve induced many people to give up what might be a favorite vegetable, leading them to eat a less healthy diet overall, and causing, let’s say, two hundred excess deaths annually.*
As always, we obtain the expected value by multiplying the result of each possible outcome by the corresponding probability, and then adding everything up. In this case, we find that
75% × 1000 + 25% × (−200) = 750 − 50 = 700.
So our recommendation has an expected value of seven hundred lives saved per year. Over the loud and well-financed complaints of the Eggplant Council, and despite our very real uncertainty, we go public.
Remember: the expected value doesn’t represent what we literally expect to happen, but rather what we might expect to happen on average were the same decision to be repeated again and again. A public health decision isn’t like flipping a coin; it’s something you can do only once. On the other hand, eggplants are not the only environmental danger we may be called upon to assess. Maybe it will come to our attention next that cauliflower is associated with arthritis, or vibrating toothbrushes with autism. If, in each case, an intervention has an expected value of seven hundred lives a year, we should make them all, and on average we will expect to save seven hundred lives each time. In any individual case, we might end up doing more harm than good, but overall we’re going to save a lot of lives. Like the lottery players on roll-down day, we risk losing on any given instance, but are almost assured to come out ahead in the long run.
And if we held ourselves to a stricter evidentiary standard, declining to issue any of these recommendations because we weren’t sure we were right? Then the lives we would have saved would be lost instead.
It would be great if we could assign precise, objective probabilities to real-life health conundrums, but of course we can’t. This is another way that the interaction of a drug with the human body differs from a coin you can flip or a lottery ticket you can scratch. We’re stuck with the messy, vague probabilities that reflect our degree of belief in various hypotheses, the probabilities that R. A. Fisher loudly denied were probabilities at all. So we don’t and can’t know the exact expected value of launching a campaign against eggplant or vibrating toothbrushes, or tobacco. But often we can say with confidence that the expected value is positive. Again, that doesn’t mean the campaign is sure to have good effects, only that the sum total of all similar campaigns, over time, is likely to do more good than harm. The very nature of uncertainty is that we don’t know which of our choices will help, like attacking tobacco, and which will hurt, like recommending hormone replacement therapy. But one thing’s for certain: refraining from making recommendations at all, on the grounds that they might be wrong, is a losing strategy. It’s a lot like George Stigler’s advice about missing planes. If you never give advice until you’re sure it’s right, you’re not giving enough advice.
BERKSON’S FALLACY, OR: WHY ARE HANDSOME MEN SUCH JERKS?
That correlations can arise from unseen common causes is confusing enough, but that’s not the end of the story. Correlations can also come from common effects. This phenomenon is known as Berkson’s fallacy, after the medical statistician Joseph Berkson, who back in chapter 8 explained how blind reliance on p-values could lead you to conclude that a small group of people including an albino consisted of nonhumans.
Berkson himself was, like Fisher, a vigorous skeptic about the link between tobacco and cancer. Berkson, an MD, represented the old school of epidemiology, deeply suspicious of any claim whose support was more statistical than medical. Such claims, he felt, represented a trespass by naive theorists onto ground that rightfully belonged to the medical profession. “Cancer is a biologic, not a statistical problem,” he wrote in 1958. “Statistics can soundly play an ancillary role in its elucidation. But if biologists permit statisticians to become arbiters of biologic questions, scientific disaster is inevitable.”
Berkson was especially troubled by the fact that tobacco use was found to be correlated not only with lung cancer but with dozens of other diseases, afflicting every system of the human body. For Berkson, the idea that tobacco could be so thoroughgoingly poisonous was inherently implausible: “It is as though, in investigating a drug that previously had been indicated to relieve the common cold, the drug was found not only to ameliorate coryza, but to cure pneumonia, cancer, and many other diseases. A scientist would say, ‘There must be something wrong with this method of investigation.’”
Berkson, like Fisher, was more apt to believe the “constitutional
hypothesis,” that some preexisting difference between nonsmokers and smokers accounted for the relative healthiness of the abstainers:
If 85 to 95 per cent of a population are smokers, then the small minority who are not smokers would appear, on the face of it, to be of some special type of constitution. It is not implausible that they should be on the average relatively longevous, and this implies that death rates generally in this segment of the population will be relatively low. After all, the small group of persons who successfully resist the incessantly applied blandishments and reflex conditioning of the cigaret advertisers are a hardy lot, and, if they can withstand these assaults, they should have relatively little difficulty in fending off tuberculosis or even cancer!
Berkson also objected to the original study of Doll and Hill, which was conducted among patients in British hospitals. What Berkson had observed in 1938 was that selecting patients in this way can create the appearance of associations that aren’t really there.
Suppose, for example, you want to know whether high blood pressure is a risk factor for diabetes. You might take a survey of the patients in your hospital, with the goal of finding out whether high blood pressure was more common among the nondiabetics or the diabetics. And you find, to your surprise, that high blood pressure is less common among the patients with diabetes. You might thus be inclined to conclude that high blood pressure was protective against diabetes, or at least against diabetic symptoms so severe as to require hospitalization. But before you start advising diabetic patients to ramp up their consumption of salty snacks, consider this table.
1,000 total population
300 people with high blood pressure
400 people with diabetes
120 people with both high blood pressure and diabetes
Suppose there are a thousand people in our town, of whom 30% have high blood pressure and 40% have diabetes. (We like salty snacks and sweet snacks in our town.) And let’s suppose, furthermore, that there’s no relation between the two conditions; so 30% of the 400 diabetics, or 120 people in all, suffer from high blood pressure as well.
If all the sick people in town wind up in the hospital, then your hospital population is going to consist of
180 people with high blood pressure but no diabetes
280 people with diabetes but no high blood pressure
120 people with both high blood pressure and diabetes
Of the 400 total diabetics in the hospital, 120, or 30%, have high blood pressure. But of the 180 nondiabetics in the hospital, 100% have high blood pressure! It would be nuts to conclude from this that high blood pressure keeps you from having diabetes. The two conditions are negatively correlated, but that’s not because one causes the absence of the other. It’s also not because there’s a hidden factor that both raises your blood pressure and helps regulate your insulin. It’s because the two conditions have a common effect—namely, they put you in the hospital.
To put it in words: if you’re in the hospital, you’re there for a reason. If you’re not diabetic, that makes it more likely the reason is high blood pressure. So what looks at first like a causal relationship between high blood pressure and diabetes is really just a statistical phantom.
The effect can work the other way, too. In real life, having two diseases is more likely to land you in the hospital than having one. Maybe the 120 hypertensive diabetics in our town all end up in the hospital, but 90% of the relatively healthy folks with only one thing wrong with them stay home. What’s more, there are other reasons to be in the hospital; for instance, on the first snowy day of the year, a lot of people try to clean out their snowblower with their hand and get their finger chopped off. So the hospital population might look like
10 people with no diabetes or high blood pressure but a severed finger
18 people with high blood pressure but no diabetes
28 people with diabetes but no high blood pressure
120 people with both high blood pressure and diabetes
Now, when you do your hospital study, you find that 120 out of 148 diabetics, or 81%, have high blood pressure. But only 18 of the 28 nondiabetics, or 64%, have high blood pressure. That makes it seem that high blood pressure makes you more likely to have diabetes. But again, it’s an illusion; all we’re measuring is the fact that the set of people who end up in the hospital is anything but a random sample of the population.
Berkson’s fallacy makes sense outside the medical domain; in fact, it even makes sense outside the realm of features that can be precisely quantified. You may have noticed that, among the men* in your dating pool, the handsome ones tend not to be nice, and the nice ones tend not to be handsome. Is that because having a symmetrical face makes you cruel? Or because being nice to people makes you ugly? Well, it could be. But it doesn’t have to be. I present below the Great Square of Men:
and I take as a working hypothesis that men are in fact equidistributed all over this square; in particular, there are nice handsome ones, nice ugly ones, mean handsome ones, and mean ugly ones, in roughly equal numbers.
But niceness and handsomeness have a common effect; they put these men in the group that you notice. Be honest—the mean uglies are the ones you never even consider. So inside the Great Square is a Smaller Triangle of Acceptable Men:
And now the source of the phenomenon is clear. The handsomest men in your triangle run the gamut of personalities, from kindest to cruelest. On average, they’re about as nice as the average person in the whole population, which, let’s face it, is not that nice. And by the same token, the nicest men are only averagely handsome. The ugly guys you like, though—they make up a tiny corner of the triangle, and they are pretty darn nice—they have to be, or they wouldn’t be visible to you at all. The negative correlation between looks and personality in your dating pool is absolutely real. But if you try to improve your boyfriend’s complexion by training him to act mean, you’ve fallen victim to Berkson’s fallacy.
Literary snobbery works the same way. You know how popular novels are terrible? It’s not because the masses don’t appreciate quality. It’s because there’s a Great Square of Novels, and the only novels you ever hear about are the ones in the Acceptable Triangle, which are either popular or good. If you force yourself to read unpopular novels chosen essentially at random—I’ve been on a literary prize jury, so I’ve actually done this—you find that most of them, just like the popular ones, are pretty bad.
The Great Square is too simple by far, of course. There are many dimensions, not just two, along which you can rate your love interests or your weekly reading. So the Great Square is better described as a kind of Great Hypercube. And that’s just for your own personal preferences! If you try to understand what happens in the whole population, you need to grapple with the fact that different people define attractiveness differently; they may differ about what weights to place on various criteria, or they may simply have incompatible preferences. The process of aggregating opinions, preferences, and desires from many different people presents yet another set of difficulties. Which means it’s an opportunity to do more math. We turn to it now.
Includes: Derek Jeter’s moral status, how to decide three-way elections, the Hilbert program, using the whole cow, why Americans are not stupid, “every two kumquats are joined by a frog,” cruel and unusual punishment, “just as the work was completed the foundation gave way,” the Marquis de Condorcet, the second incompleteness theorem, the wisdom of slime molds
SEVENTEEN
THERE IS NO SUCH THING AS PUBLIC OPINION
You’re a good citizen of the United States of America, or some other more or less liberal democracy. Or maybe you’re even an elected official. You think the government should, when possible, respect the people’s will. So you want to know: What do the people want?
Sometimes you can poll the hell out of the people and it’s still tough to be sure. For example: do Americans want small government? W
ell, sure we do—we say so constantly. In a January 2011 CBS News poll, 77% of respondents said cutting spending was the best way to handle the federal budget deficit, against only 9% who preferred raising taxes. That result isn’t just a product of the current austerity vogue—year in, year out, the American people would rather cut government programs than pay more taxes.
But which government programs? That’s where things get sticky. It turns out the things the U.S. government spends money on are things people kind of like. A Pew Research poll from February 2011 asked Americans about thirteen categories of government spending: in eleven of those categories, deficit or no deficit, more people wanted to increase spending than dial it down. Only foreign aid and unemployment insurance—which, combined, accounted for under 5% of 2010 spending—got the ax. That, too, agrees with years of data; the average American is always eager to slash foreign aid, occasionally tolerant of cuts to welfare or defense, and pretty gung ho for increased spending on every single other program our taxes fund.
Oh, yeah, and we want small government.
At the state level, the inconsistency is just as bad. Respondents to the Pew poll overwhelmingly favored a combination of cutting programs and raising taxes to balance state budgets. Next question: What about cutting funding for education, health care, transportation, or pensions? Or raising sales taxes, state income tax, or taxes on business? Not a single option drew majority support.