How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843)

Home > Other > How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843) > Page 33
How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843) Page 33

by Ellenberg, Jordan


  Galton’s original definition of correlation was somewhat limited, applying only to those variables whose distribution followed the bell curve law we saw in chapter 4. But the notion was quickly adapted and generalized by Karl Pearson* to apply to any variables whatsoever.

  Were I to write down Pearson’s formula right now, or were you to go look it up, you would see a mess of square roots and ratios, which, unless you have Cartesian geometry at your fingertips, would not be very illuminating. But in fact, Pearson’s formula has a very simple geometric description. Mathematicians ever since Descartes have enjoyed the wonderful freedom to flip back and forth between algebraic and geometric descriptions of the world. The advantage of algebra is that it’s easier to formalize and to type into a computer. The advantage of geometry is that it allows us to bring our physical intuition to bear on the situation, particularly when you can draw a picture. I seldom feel I really understand a piece of mathematics until I know what it’s all about in geometric language.

  So what, for a geometer, is correlation all about? It’ll help to have an example at hand. Look again at the table on pages 332–333, which lists average January temperatures in ten California cities in 2011 and 2012. As we saw, the 2011 and 2012 temperatures have a strong positive correlation; in fact, Pearson’s formula yields a sky-high value of 0.989.

  If we want to study the relation between temperature measurements in two different years, it doesn’t matter if you modify each entry in the table by the same amount. If 2011 temperature is correlated with 2012 temperature, it’s just as correlated with “2012 temperature + 5 degrees.” Another way to put it: if you take all the points in the diagram above and move them up five inches, it doesn’t change the shape of Galton’s ellipse, merely its location. It turns out to be useful to shift the temperatures by a uniform amount to make the average value equal to zero in both 2011 and 2012. If you do that, you get a table that looks like this:

  Jan 2011

  Jan 2012

  Eureka

  −1.7

  −4.1

  Fresno

  −3.6

  −1.4

  Los Angeles

  9.0

  8.7

  Riverside

  7.6

  8.2

  San Diego

  9.9

  7.5

  San Francisco

  1.5

  0.9

  San Jose

  1.0

  0.7

  San Luis Obispo

  4.3

  3.7

  Stockton

  −5.0

  −4.0

  Truckee

  −23.1

  −20.5

  The rows of the table have negative entries for cold cities like Truckee and positive entries for balmier places like San Diego.

  Now here’s the trick. That column of ten numbers keeping track of the January 2011 temperatures is a list of numbers, yes. But it’s also a point. How so? This goes back to our hero, Descartes. You can think of a pair of numbers (x,y) as a point in the plane, x units to the right and y units upward from the origin. In fact, we can draw a little arrow pointing from the origin to our point (x,y), an arrow called a vector.

  In the same way, a point in three-dimensional space is described by a list of three coordinates (x,y,z). And nothing except habit and craven fear keeps us from pushing this further. A list of four numbers can be thought of as a point in four-dimensional space, and a list of ten numbers, like the California temperatures in our table, is a point in ten-dimensional space. Better yet, think of it as a ten-dimensional vector.

  Wait, you may rightfully ask: How am I supposed to think about that? What does a ten-dimensional vector look like?

  It looks like this:

  That’s the dirty little secret of advanced geometry. It may sound impressive that we can do geometry in ten dimensions (or a hundred, or a million . . .), but the mental pictures we keep in our mind are two- or at most three-dimensional. That’s all our brains can handle. Fortunately, this impoverished vision is usually enough.

  High-dimensional geometry can seem a little arcane, especially since the world we live in is three-dimensional (or four-dimensional, if you count time, or maybe twenty-six-dimensional, if you’re a certain kind of string theorist, but even then, you think the universe doesn’t extend very far along most of those dimensions). Why study geometry that isn’t realized in the universe?

  One answer comes from the study of data, currently in extreme vogue. Remember the digital photo from the four-megapixel camera: it’s described by 4 million numbers, one for each pixel. (And that’s before we take color into account!) So that image is a 4-million-dimensional vector; or, if you like, a point in 4-million-dimensional space. And an image that changes with time is represented by a point that’s moving around in a 4-million-dimensional space, which traces out a curve in 4-million-dimensional space, and before you know it you’re doing 4-million-dimensional calculus, and then the fun can really start.

  Back to temperature. There are two columns in our table, each of which provides us with a ten-dimensional vector. They look like this:

  The two vectors point in roughly the same direction, which reflects the fact that the two columns are not in fact so different; as we’ve already seen, the coldest cities in 2011 stayed cold in 2012, and ditto for the warm ones.

  And this is Pearson’s formula, in geometric language. The correlation between the two variables is determined by the angle between the two vectors. If you want to get all trigonometric about it, the correlation is the cosine of the angle. It doesn’t matter if you remember what cosine means; you just need to know that the cosine of an angle is 1 when the angle is 0 (i.e. when the two vectors are pointing in the same direction) and −1 when the angle is 180 degrees (vectors pointing in opposite directions). Two variables are positively correlated when the corresponding vectors are separated by an acute angle—that is, an angle smaller than 90 degrees—and negatively correlated when the angle between the vectors is larger than 90 degrees, or obtuse. It makes sense: vectors at an acute angle to one another are, in some loose sense, “pointed in the same direction,” while vectors that form an obtuse angle seem to be working at cross purposes.

  When the angle is a right angle, neither acute nor obtuse, the two variables have a correlation of zero; they are, at least as far as correlation goes, unrelated to each other. In geometry, we call a pair of vectors that form a right angle perpendicular, or orthogonal. And by extension, it’s common practice among mathematicians and other trig aficionados to use the word “orthogonal” to refer to something unrelated to the issue at hand—“You might expect that mathematical skills are associated with magnificent popularity, but in my experience, the two are orthogonal.” Slowly this usage is creeping out of the geekolect into the wider language. You can just about see it happening in a recent Supreme Court oral argument:

  MR. FRIEDMAN: I think that issue is entirely orthogonal to the issue here because the Commonwealth is acknowledging—

  CHIEF JUSTICE ROBERTS: I’m sorry. Entirely what?

  MR. FRIEDMAN: Orthogonal. Right angle. Unrelated. Irrelevant.

  CHIEF JUSTICE ROBERTS: Oh.

  JUSTICE SCALIA: What was that adjective? I like that.

  MR. FRIEDMAN: Or
thogonal.

  JUSTICE SCALIA: Orthogonal?

  MR. FRIEDMAN: Right, right.

  JUSTICE SCALIA: Ooh.

  (Laughter.)

  I’m rooting for orthogonal to catch on. It’s been a while since a mathy word really broke out into demotic English. Lowest common denominator has by now lost its mathematical flavor almost entirely, and exponentially—just don’t get me started on exponentially.*

  The application of trigonometry to high-dimensional vectors in order to quantify correlation is not, to put it mildly, what the developers of the cosine had in mind. The Nicaean astronomer Hipparchus, who wrote down the first trigonometric tables in the second century BCE, was trying to compute the time lapse between eclipses; the vectors he dealt with described objects in the sky, and were solidly three-dimensional. But a mathematical tool that’s just right for one purpose tends to make itself useful again and again.

  The geometric understanding of correlation clarifies aspects of statistics that might otherwise be murky. Consider the case of the wealthy liberal elitist. For a while now, this slightly disreputable fellow has been a familiar character in political punditry. Perhaps his most devoted chronicler is the political writer David Brooks, who wrote a whole book about the group he called the Bohemian Bourgeoisie, or Bobos. In 2001, contemplating the difference between suburban, affluent Montgomery County, Maryland (my birthplace!), and middle-class Franklin County, Pennsylvania, he speculated that the old political stratification by economic class, with the GOP standing up for the moneybags and the Democrats for the working man, was badly out of date.

  Like upscale areas everywhere, from Silicon Valley to Chicago’s North Shore to suburban Connecticut, Montgomery County supported the Democratic ticket in last year’s presidential election, by a margin of 63 percent to 34 percent. Meanwhile, Franklin County went Republican, by 67 percent to 30 percent.

  First of all, this “everywhere” is a little strong. Wisconsin’s richest county is Waukesha, centered on the tony suburbs west of Milwaukee. Bush crushed Gore there, 65−31, while Gore narrowly won statewide.

  Still, Brooks is pointing to a real phenomenon, one we saw depicted quite plainly in a scatterplot a few pages back. In the contemporary U.S. electoral landscape, rich states are more likely than poor states to vote for the Democrats. Mississippi and Oklahoma are Republican strongholds, while the GOP doesn’t even bother to contest New York and California. In other words, being from a rich state is positively correlated with voting Democratic.

  But statistician Andrew Gelman found that the story is more complicated than the Brooksian portrait of a new breed of latte-sipping, Prius-driving liberals with big tasteful houses and NPR tote bags full of cash. In fact, rich people are still more likely to vote Republican than poor people are, an effect that’s been consistently present for decades. Gelman and his collaborators, digging deeper into the state-by-state data, find a very interesting pattern. In some states, like Texas and Wisconsin, richer counties tend to vote more Republican. In others, like Maryland, California, and New York, the richer counties are more Democratic. Those last states happen to be the ones where many political pundits live. In their limited worlds, the rich neighborhoods are loaded with rich liberals, and it’s natural for them to generalize this experience to the rest of the country. Natural, but when you look at the overall numbers, plainly wrong.

  But there seems to be a paradox here. Being rich is positively correlated with being from a rich state, more or less by definition. And being from a rich state is positively correlated with voting for Democrats. Doesn’t that mean being rich has to be correlated with voting Democratic? Geometrically: if vector 1 is at an acute angle to vector 2, and vector 2 is at an acute angle to vector 3, does vector 1 have to be at an acute angle to vector 3?

  No! Proof by picture:

  Some relationships, like “bigger than,” are transitive; if I weigh more than my son and my son weighs more than my daughter, it’s an absolute certainty that I weigh more than my daughter. “Lives in the same city as” is transitive, too—if I live in the same city as Bill, who lives in the same city as Bob, then I live in the same city as Bob.

  Correlation is not transitive. It’s more like “blood relation”—I’m related to my son, who’s related to my wife, but my wife and I aren’t blood relatives to each other. In fact, it’s not a terrible idea to think of correlated variables as “sharing part of their DNA.” Suppose I run a boutique money management firm with just three investors, Laura, Sara, and Tim. Their stock positions are pretty simple: Laura’s fund is split 50-50 between Facebook and Google, Tim’s is half General Motors and half Honda, and Sara, poised between old economy and new, goes half Honda, half Facebook. It’s pretty obvious that Laura’s returns will be positively correlated with Sara’s; they have half their portfolio in common. And the correlation between Sara’s returns and Tim’s will be equally strong. But there’s no reason to think Tim’s performance has to be correlated with Laura’s.* Those two funds are like the parents, each contributing half its “genetic material” to form Sara’s hybrid fund.

  The non-transitivity of correlation is somehow obvious and mysterious at the same time. In the mutual-fund example, you’d never be fooled into thinking that a rise in Tim’s performance gives much information about how Laura’s doing. But our intuition does less well in other domains. Consider, for instance, the case of “good cholesterol,” the common name for cholesterol conveyed around the bloodstream by high-density lipoproteins, or HDL. It’s been known for decades that high levels of HDL cholesterol in the blood are associated with a lower risk of “cardiovascular events.” If you’re not a native speaker of medicalese, that means people with plenty of good cholesterol are less likely on average to clutch their hearts and keel over dead.

  We also know that certain drugs reliably increase HDL levels. A popular one is niacin, a form of vitamin B. If niacin increases HDL, and more HDL is associated with lower risk of cardiovascular events, then it seems like popping niacin is a wise idea; that’s why my physician recommended it to me, as yours probably did too, unless you’re a teenager or a marathon runner or a member of some other metabolically privileged caste.

  The problem is, it’s not clear it works. Niacin supplementation recorded promising results in small clinical trials. But a large-scale trial carried out by the National Heart, Lung, and Blood Institute was halted in 2011, a year and a half before the scheduled finish, because the results were so weak it didn’t seem worth it to continue. Patients who got niacin had higher HDL levels, all right, but they had just as many heart attacks and strokes as everybody else. How can this be? Because correlation isn’t transitive. Niacin is correlated with high HDL, and high HDL is correlated with low risk of heart attack, but that doesn’t mean that niacin prevents heart attacks.

  Which isn’t to say that manipulating HDL cholesterol is a dead end. Every drug is different, and it might be clinically relevant how you boost that HDL number. Back to the investment firm: we know that Tim’s returns are correlated with Sara’s, so you might try to improve Sara’s earnings by taking measures to improve Tim’s. If your approach were to issue a falsely optimistic stock tip to goose GM’s stock price, you’d find that you improved Tim’s performance, while Sara got no benefit. But if you did the same thing to Honda, Tim’s and Sara’s numbers would both improve.

  If correlation were transitive, medical research would be a lot easier than it actually is. Decades of observation and data collection have given us lots of known correlations to work with. If we had transitivity, doctors could just chain these together into reliable interventions. We know that women’s estrogen levels are correlated with lower risk of heart disease, and we know that hormone replacement therapy can raise those levels, so you might expect hormone replacement therapy to be protective against heart disease. And, indeed, that used to be conventional clinical wisdom. But the truth, as you’ve probably heard, is a lot more complicated. In
the early 2000s, the Women’s Health Initiative, a long-term study involving a gigantic randomized clinical trial, reported that hormone replacement therapy with estrogen and progestin appeared actually to increase the risk of heart disease in the population they studied. More recent results suggest that the effect of hormone replacement therapy might be different in different groups of women, or that estrogen alone might be better for your heart than the estrogen-progestin combo, and so on.

  In the real world, it’s next to impossible to predict what effect a drug will have on a disease, even if you know a lot about how it affects biomarkers like HDL or estrogen level. The human body is an immensely complex system, and there are only a few of its features we can measure, let alone manipulate. Based on the correlations we can observe, there are lots of drugs that might plausibly have a desired health effect. And so you try them out in experiments, and most of them fail dismally. To work in drug development requires a resilient psyche, not to mention a vast pool of capital.

 

‹ Prev