How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843)
Page 30
So, too, Galton reasoned, must it be for mental achievement. And this conforms with common experience; the children of a great composer, or scientist, or political leader, often excel in the same field, but seldom so much so as their illustrious parent. Galton was observing the same phenomenon that Secrist would uncover in the operations of business. Excellence doesn’t persist; time passes, and mediocrity asserts itself.*
But there’s one big difference between Galton and Secrist. Galton was, in his heart, a mathematician, and Secrist was not. And so Galton understood why regression was taking place, while Secrist remained in the dark.
Height, Galton understood, was determined by some combination of inborn characteristics and external forces; the latter might include environment, childhood health, or just plain chance. I am six foot one, and in part that’s because my father is six foot one and I share some of his height-promoting genetic material. But it’s also because I ate reasonably nutritious food as a child and didn’t undergo any unusual stresses that would have stunted my growth. And my height was no doubt bumped up and down by who knows how many other experiences I underwent, in utero and ex. Tall people are tall because their heredity predisposes them to be tall, or because external forces encourage them to be tall, or both. And the taller a person is, the likelier it is that both factors are pointing in the upward direction.
In other words, people drawn from the tallest segment of the population are almost certain to be taller than their genetic predisposition would suggest. They were born with good genes, but they also got a boost from environment and chance. Their children will share their genes, but there’s no reason the external factors will once again conspire to boost their height over and above what heredity accounts for. And so, on average, they’ll be taller than the average person, but not quite so exceedingly tall as their beanpole parents. That’s what causes regression to the mean: not a mysterious mediocrity-loving force, but the simple workings of heredity intermingled with chance. That’s why Galton writes that regression to the mean is “theoretically a necessary fact.” At first, it came to him as a surprising feature of his data, but once he understood what was going on, he saw it couldn’t possibly have come out any other way.
It’s just the same for businesses. Secrist wasn’t wrong about the firms that had the fattest profits in 1922; it’s likely that they ranked among the most well managed companies in their sectors. But they were lucky, too. As time went by, their management might well have remained superior in wisdom and judgment. But the companies that were lucky in 1922 were no more likely than any other companies to be lucky ten years later. And so those top-sextile companies start slipping in the rankings as the years go by.
In fact, almost any condition in life that involves random fluctuations in time is potentially subject to the regression effect. Did you try a new apricot-and-cream-cheese diet and find you lost three pounds? Think back to the moment you decided to slim down. More than likely it was a moment at which the normal up-and-down of your weight had you at the top of your usual range, because those are the kinds of moments when you look down at the scale, or just at your midsection, and say, jeez, I’ve gotta do something. But if that’s the case, you might well have lost three pounds anyway, apricots or no apricots, when you trended back toward your normal weight. You’ve learned very little about the efficacy of the diet.
You might try to address this problem by random sampling: choose two hundred patients at random, check which ones are overweight, and then try the diet on the overweight folks. But then you’d be doing just what Secrist did. The heaviest segment of the population is a lot like the top sextile of businesses. They are certainly more likely than the average person to have a consistent weight problem. But they are also more likely to be at the top of their weight range on the day you happened to weigh them. Just as Secrist’s high performers degraded toward mediocrity with time, so will your heavy patients lose weight, whether the diet is effective or not. That’s why the better sort of diet studies don’t just study the effects of one diet; they compare two candidate diets to see which induces more weight loss. Regression to the mean should affect each group of dieters equally, so that comparison is fair.
Why is the second novel by a breakout debut writer, or the second album by an explosively popular band, so seldom as good as the first? It’s not, or not entirely, because most artists only have one thing to say. It’s because artistic success is an amalgam of talent and fortune, like everything else in life, and thus subject to regression to the mean.*
Running backs who sign big multiyear contracts tend to record fewer yards per carry in the season following.* Some people claim that’s because they no longer have a financial incentive to stretch for that extra yard, and that psychological factor probably does play a role. But just as important is that they signed the big contract as a result of having a massively good year. It would be bizarre if they didn’t return to a more ordinary level of performance the following season.
“ON PACE”
As I write, it’s April, the beginning of baseball season, when every year we’re treated to a bouquet of news stories about which players are “on pace” to perform which unimaginable record-shattering feat. Today on ESPN I learn that “Matt Kemp is off to a blazing start, hitting .460 and on pace for 86 home runs, 210 RBIs, and 172 runs scored.” These eye-popping numbers (no one in the history of major-league baseball has ever hit more than 73 home runs in a season) are a typical example of false linearity. It’s like a word problem: “If Marcia can paint 9 houses in 17 days, and she has 162 days to paint as many houses as she can . . .”
Kemp hit nine home runs in the Dodgers’ first seventeen games, a rate of 9/17 runs per game. So an amateur algebraist might write down the following linear equation:
H = G × (9 / 17)
where H is the number of home runs Kemp hits for the full season, and G is the number of games his team plays. A baseball season is 162 games long. And when you plug in 162 for G, you get 86 (or rather 85.7647, but 86 is the closest whole number).
But not all curves are lines. Matt Kemp will not hit eighty-six home runs this year. And it’s regression to the mean that explains why. At any point in the season, it’s pretty likely that the league leader in home runs is a good home run hitter. Indeed, it’s clear from Kemp’s history that there are intrinsic Matt Kemp qualities that enable him regularly to club a baseball with awe-inspiring force. But the league leader in home runs is also very likely to have been lucky. Which means that, whatever his league-leading pace is, you can expect it to drop as the season goes on.
No one at ESPN, to be fair, thinks Matt Kemp is going to hit eighty-six home runs. These “on pace” statements, when made in April, are usually delivered in a half-joking tone: “Of course he won’t, but what if he kept this up?” But as the summer goes on, the tongue draws farther and farther out of the cheek, until by midseason people are quite serious about using a linear equation to project a player’s statistics to the end of the year.
But it’s still wrong. If there’s regression to the mean in April, there’s regression to the mean in July.
Ballplayers get this. Derek Jeter, when bugged about being on pace to break Pete Rose’s career hit record, told the New York Times, “One of the worst phrases in sports is ‘on pace for.’” Wise words!
Let’s make this less theoretical. If I’m leading the American League in home runs at the All-Star break, how many home runs should I expect to hit the rest of the way?
The All-Star break divides the baseball season into a “first half” and a “second half,” but the second half is actually a bit shorter: in recent years, between 80% and 90% as long as the first half. So you might expect me to hit about 85% as many home runs in the second half as I did in the first.*
But history says this is the wrong thing to expect. To get a sense of what really goes on, I looked at first-half American League home run leaders in nineteen seas
ons between 1976 and 2000 (excluding years shortened by strikes and those where there was a tie for first-half leader). Only three (Jim Rice in 1978, Ben Oglivie in 1980, and Mark McGwire in 1997) hit as many as 85% of their first-half total after the break. And for every one of those, there’s a hitter like Mickey Tettleton, who led the AL with twenty-four homers at the 1993 all-star break and managed only eight the rest of the way. The sluggers, on average, hit only 60% as many home runs in the second half as they had in their league-leading first. This decline isn’t due to fatigue, or the August heat; if it were, you’d see a similarly large decline in home run production around the league. It’s simple regression to the mean.
And it’s not restricted to the very best home run hitter in the league. The Home Run Derby, held during the All-Star break each year, is a competition where baseball’s top mashers compete to hit as many moon shots as they can against a batting-practice pitcher. Some batters complain that the artificial conditions of the derby throw off their timing and make it harder to hit home runs in the weeks after the break: the Home Run Derby Curse. The Wall Street Journal ran a breathless story, “The Mysterious Curse of the Home Run Derby,” in 2009, which was vigorously rebutted by the statistically minded baseball blogs. That didn’t stop the Journal from revisiting the same ground in 2011, with “The Great Derby Curse Strikes Once Again.” But there is no curse. The participants in the derby are there because they had an awfully good start to the season. Regression demands that their later production, on average, won’t keep up with the pace they’ve set.
As for Matt Kemp, he injured a hamstring in May, missed a month, and was a different player when he returned. He finished the 2012 season not with the eighty-six home runs he was “on pace” for, but twenty-three.
There’s something the mind resists about regression to the mean. We want to believe in a force that brings down the mighty. It’s not satisfying enough to accept what Galton knew in 1889: the apparently mighty are seldom quite as mighty as they look.
SECRIST MEETS HIS MATCH
This crucial point, invisible to Secrist, was not so obscure to more mathematically minded researchers. In contrast to Secrist’s generally respectful reviews was the famous statistical smackdown delivered by Harold Hotelling in the Journal of the American Statistical Association. Hotelling was a Minnesotan, the son of a hay dealer, who went to college to study journalism and there discovered an extraordinary talent for mathematics. (Francis Galton, had he gone on to study the heredity of notable Americans, would have been pleased to know that despite Hotelling’s humble upbringing his ancestors included a secretary of the Massachusetts Bay Colony and an Archbishop of Canterbury.) Like Abraham Wald, Hotelling started in pure math, writing a PhD dissertation in algebraic topology at Princeton. He would go on to lead the wartime Statistical Research Group in New York—the same place Wald explained to the army how to put the armor where the bullet holes weren’t. In 1933, when Secrist’s book came out, Hotelling was a young professor at Columbia who had already made major contributions to theoretical statistics, especially in relation to economic problems. He was said to enjoy playing Monopoly in his head; having memorized the board and the frequencies of the various Chance and Community Chest cards, this was a simple exercise in random number generation and mental bookkeeping. This should give some impression both of Hotelling’s mental powers and of the sort of thing he enjoyed.
Hotelling was totally devoted to research and the generation of knowledge, and in Secrist he may have seen something of a kindred soul. “The labor of compilation and of direct collection of data,” he wrote sympathetically, “must have been gigantic.”
Then the hammer drops. The triumph of mediocrity observed by Secrist, Hotelling points out, is more or less automatic whenever we study a variable that’s affected by both stable factors and the influence of chance. Secrist’s hundreds of tables and graphs “prove nothing more than that the ratios in question have a tendency to wander about.” The result of Secrist’s exhaustive investigation is “mathematically obvious from general considerations, and does not need the vast accumulation of data adduced to prove it.” Hotelling drives his point home with a single, decisive observation. Secrist believed the regression to mediocrity resulted from the corrosive effect of competitive forces over time; competition was what caused the top stores in 1916 to be hardly above average in 1922. But what happens if you select the stores with the highest performance in 1922? As in Galton’s analysis, these stores are likely to have been both lucky and good. If you turn back the clock to 1916, whatever intrinsic good management they possess should still be in force, but their luck may be totally different. Those stores will typically be closer to mediocre in 1916 than in 1922. In other words, if regression to the mean were, as Secrist thought, the natural result of competitive forces, those forces would have to work backward in time as well as forward.
Hotelling’s review is polite but firm, distinctly more in sorrow than in anger: he is trying to explain to a distinguished colleague, in the kindest way possible, that he has wasted ten years of his life. But Secrist didn’t take the hint. The issue after next of JASA ran his contentious letter of response, pointing out a few misapprehensions in Hotelling’s review, but otherwise spectacularly missing the point. Secrist insisted once again that the regression to mediocrity was not a mere statistical generality, but rather was particular to “data affected by competitive pressure and managerial control.” At this point Hotelling stops being nice and lays it out straight. “The thesis of the book,” he writes in response, “when correctly interpreted, is essentially trivial. . . . To ‘prove’ such a mathematical result by a costly and prolonged numerical study of many kinds of business profit and expense ratios is analogous to proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals. The performance, though perhaps entertaining, and having a certain pedagogical value, is not an important contribution either to zoölogy or mathematics.”
THE TRIUMPH OF MEDIOCRITY IN ORAL-ANAL TRANSIT TIME
It’s hard to blame Secrist too much. It took Galton himself some twenty years to fully grasp the meaning of regression to the mean, and many subsequent scientists misunderstood Galton exactly as Secrist had. The biometrician Walter F. R. Weldon, who had made his name by showing that Galton’s findings about the variation in human traits held equally well for shrimp, said in a 1905 lecture about Galton’s work:
Very few of those biologists who have tried to use his methods have taken the trouble to understand the process by which he was led to adopt them, and we constantly find regression spoken of as a peculiar property of living things, by virtue of which variations are diminished in intensity during their transmission from parent to child, and the species is kept true to type. This view may seem plausible to those who simply consider that the mean deviation of children is less than that of their fathers: but if such persons would remember the equally obvious fact that there is also a regression of fathers on children, so that the fathers of abnormal children are on the whole less abnormal than their children, they would either have to attribute this feature of regression to a vital property by which children are able to reduce the abnormality of their parents, or else to recognize the real nature of the phenomenon they are trying to discuss.
Biologists are eager to think regression stems from biology, management theorists like Secrist want it to come from competition, literary critics ascribe it to creative exhaustion—but it is none of these. It is mathematics.
And still, despite the entreaties of Hotelling, Weldon, and Galton himself, the message hasn’t totally sunk in. It’s not just the Wall Street Journal sports page that gets this wrong; it happens to scientists, too. One particularly vivid example comes from a 1976 British Medical Journal paper on the treatment of diverticular disease with bran. (I am just old enough to remember 1976, when bran was spoken of by health enthusiasts with the kind of reverence that omega-3 fatt
y acids and antioxidants enjoy today.) The authors recorded each patient’s “oral-anal transit time”—that is, the length of time a meal spent in the body between entrance and exit—before and after the bran treatment. They found that bran has a remarkable regularizing effect. “All those with rapid times slowed down towards 48 hours . . . those with medium length transits showed no change . . . and those with slow transit times tended to speed up towards 48 hours. Thus bran tended to modify both slow and fast initial transit times towards a 48-hour mean.” This, of course, is precisely what you’d expect if bran had no effect at all. To put it delicately, we all have our fast days and our slow days, whatever our underlying level of intestinal health. And an unusually quick transit on Monday is likely to be followed by a more average transit time on Tuesday, bran or no bran.*
Then there’s the rise and fall of Scared Straight. The program took juvenile offenders on tours of prisons, where inmates warned them about the horrors that awaited them on the inside if they didn’t drop their criminal ways pronto. The original program, held in New Jersey’s Rahway State Prison, was featured in an Oscar-winning documentary in 1978 and quickly spawned imitations across the United States and as far away as Norway. Teenagers raved about the moral kick in the pants they got from Scared Straight, and wardens and prisoners liked the opportunity to contribute something positive to society. The program resonated with a popular, deep-seated sense that overindulgence by parents and society were to blame for youth crime. Most important, Scared Straight worked. One representative program, in New Orleans, reported that participants were arrested less than half as often after Scared Straight than before.