As it happened, Brian was in luck—I had a colleague at Princeton, Edward Nelson, who was an expert in nonstandard analysis. I set up a meeting for the two of them so Brian could learn more about it. The meeting, Ed told me later, didn’t go well. As soon as Ed made it clear that infinitesimal quantities were not in fact going to be called Brian numbers, Brian lost all interest.
(Moral lesson: people who go into mathematics for fame and glory don’t stay in mathematics for long.)
But we’re no closer to settling our dispute. What is 0.999 . . . , really? Is it 1? Or is it some number infinitesimally less than 1, a crazy kind of number that hadn’t even been discovered a hundred years ago?
The right answer is to unask the question. What is 0.999. . . . , really? It appears to refer to a kind of sum:
.9 + .09 + .009 + .0009 + . . .
But what does that mean? That pesky ellipsis is the real problem. There can be no controversy about what it means to add up two, or three, or a hundred numbers. This is just mathematical notation for a physical process we understand very well: take a hundred heaps of stuff, mush them together, see how much you have. But infinitely many? That’s a different story. In the real world, you can never have infinitely many heaps. What’s the numerical value of an infinite sum? It doesn’t have one—until we give it one. That was the great innovation of Augustin-Louis Cauchy, who introduced the notion of limit into calculus in the 1820s.*
The British number theorist G. H. Hardy, in his 1949 book Divergent Series, explains it best:
It does not occur to a modern mathematician that a collection of mathematical symbols should have a “meaning” until one has been assigned to it by definition. It was not a triviality even to the greatest mathematicians of the eighteenth century. They had not the habit of definition: it was not natural to them to say, in so many words, “by X we mean Y.” . . . It is broadly true to say that mathematicians before Cauchy asked not, “How shall we define 1 − 1 + 1 − 1 + . . .” but “What is 1 − 1 + 1 − 1 + . . . ?” and that this habit of mind led them into unnecessary perplexities and controversies which were often really verbal.
This is not just loosey-goosey mathematical relativism. Just because we can assign whatever meaning we like to a string of mathematical symbols doesn’t mean we should. In math, as in life, there are good choices and there are bad ones. In the mathematical context, the good choices are the ones that settle unnecessary perplexities without creating new ones.
The sum .9 + .09 + .009 + . . . gets closer and closer to 1 the more terms you add. And it never gets any farther away. No matter how tight a cordon we draw around the number 1, the sum will eventually, after some finite number of steps, penetrate it, and never leave. Under those circumstances, Cauchy said, we should simply define the value of the infinite sum to be 1. And then he worked very hard to prove that committing oneself to his definition didn’t cause horrible contradictions to pop up elsewhere. By the time this labor was done, he’d constructed a framework that made Newton’s calculus completely rigorous. When we say a curve looks locally like a straight line at a certain angle, we now mean more or less this: as you zoom in tighter and tighter, the curve resembles the given line more and more closely. In Cauchy’s formulation, there’s no need to mention infinitely small numbers, or anything else that would make a skeptic blanch.
Of course there is a cost. The reason the 0.999 . . . problem is difficult is that it brings our intuitions into conflict. We would like the sum of an infinite series to play nicely with arithmetic manipulations like the ones we carried out on the previous pages, and this seems to demand that the sum equal 1. On the other hand, we would like each number to be represented by a unique string of decimal digits, which conflicts with the claim that the same number can be called either 1 or 0.999 . . . , as we like. We can’t hold on to both of these desires at once; one must be discarded. In Cauchy’s approach, which has amply proved its worth in the two centuries since he invented it, it’s the uniqueness of the decimal expansion that goes out the window. We’re untroubled by the fact that the English language sometimes uses two different strings of letters (i.e., two words) to refer synonymously to the same thing in the world; in the same way, it’s not so bad that two different strings of digits can refer to the same number.
As for Grandi’s 1 − 1 + 1 − 1 + . . . , it is one of the series outside the reach of Cauchy’s theory: that is, one of the divergent series that formed the subject of Hardy’s book. The Norwegian mathematician Niels Henrik Abel, an early fan of Cauchy’s approach, wrote in 1828, “Divergent series are the invention of the devil, and it is shameful to base on them any demonstration whatsoever.”* Hardy’s view, which is our view today, is more forgiving; there are some divergent series to which we ought to assign values and some to which we ought not, and some to which we ought or ought not depending on the context in which the series arises. Modern mathematicians would say that if we are to assign the Grandi series a value, it should be 1/2, because, as it turns out, all interesting theories of infinite sums either give it the value 1/2 or decline, like Cauchy’s theory, to give it any value at all.*
To write Cauchy’s definitions down precisely takes a bit more work. This was especially true for Cauchy himself, who had not quite phrased the ideas in their clean, modern form.* (In mathematics, you very seldom get the clearest account of an idea from the person who invented it.) Cauchy was an unwavering conservative and a royalist, but in his mathematics he was proudly revolutionary and a scourge to academic authority. Once he understood how to do things without the dangerous infinitesimals, he unilaterally rewrote his syllabus at the École Polytechnique to reflect his new ideas. This enraged everyone around him: his mystified students, who had signed up for freshman calculus, not a seminar on cutting-edge pure mathematics; his colleagues, who felt that the engineering students at the École had no need for Cauchy’s level of rigor; and the administrators, whose commands to stick to the official course outline he completely ignored. The École imposed a new curriculum from above that emphasized the traditional infinitesimal approach to calculus, and placed note takers in Cauchy’s classroom to make sure he complied. Cauchy did not comply. Cauchy was not interested in the needs of engineers. Cauchy was interested in the truth.
It’s hard to defend Cauchy’s stance on pedagogical grounds. But I’m sympathetic with him anyway. One of the great joys of mathematics is the incontrovertible feeling that you’ve understood something the right way, all the way down to the bottom; it’s a feeling I haven’t experienced in any other sphere of mental life. And when you know how to do something the right way, it’s hard—for some stubborn people, impossible—to make yourself explain it the wrong way.
THREE
EVERYONE IS OBESE
The stand-up comic Eugene Mirman tells this joke about statistics. He says he likes to tell people, “I read that 100% of Americans were Asian.”
“But Eugene,” his confused companion protests, “you’re not Asian.”
And the punch line, delivered with magnificent self-assurance: “I read that I was!”
I thought of Mirman’s joke when I encountered a paper in the journal Obesity whose title posed the discomfiting question: “Will all Americans become overweight or obese?” As if the rhetorical question weren’t enough, the article supplies an answer: “Yes—by 2048.”
In 2048 I’ll be seventy-seven years old, and I hope not to be overweight. But I read I would be!
The Obesity paper got plenty of press, as you might imagine. ABC News warned of an “obesity apocalypse.” The Long Beach Press-Telegram went with the simple headline “We’re Getting Fatter.” The study’s results resonated with the latest manifestation of the fevered, ever-shifting anxiety with which Americans have always contemplated our national moral status. Before I was born, boys grew long hair and thus we were bound to get whipped by the Communists. When I was a kid, we played arcade games too much, which left us doomed to be
outcompeted by the industrious Japanese. Now, we eat too much fast food, and we’re all going to die weak and immobile, surrounded by empty chicken buckets, puddled into the couches from which we long ago became unable to hoist ourselves. The paper certified this anxiety as a fact proved by science.
I have some good news. We’re not all going to be overweight in the year 2048. Why? Because not every curve is a line.
But every curve, as we just learned from Newton, is pretty close to a line. That’s the idea that drives linear regression, the statistical technique that is to social science as the screwdriver is to home repair. It’s the one tool you’re pretty much definitely going to use, whatever the task. Every time you read in the newspaper that people with more cousins are happier, or that countries that have more Burger Kings have looser morals, or that halving your intake of niacin doubles your risk of athlete’s foot, or that every extra $10,000 of income makes you 3% more likely to vote Republican,* you’re encountering the result of a linear regression.
Here’s how it works. You have two things you want to relate; let’s say, the cost of tuition at a university and the average SAT score of its incoming students. You might think schools with higher SATs are likely to be pricier; but a look at the data tells you that’s not a universal law. Elon University, just outside Burlington, North Carolina, has an average combined math and verbal score of 1217, and charges $20,441 tuition a year. Nearby Guilford College, in Greensboro, is a bit pricier at $23,420, but entering first-years there averaged only 1131 on the SAT.
Still, if you look at a whole list of schools—say, the thirty-one private universities that reported their tuition and scores to the North Carolina Career Resource Network in 2007—you see a clear trend.
Each dot on the plot represents one of the colleges. Those two dots way up in the upper right-hand corner, with sky-high SAT scores and prices to match? Those are Wake Forest and Davidson. The lonely dot near the bottom, the only private school on the list with tuition under $10K, is Cabarrus College of Health Sciences.
The picture shows clearly that schools with higher scores have higher prices, by and large. But how much higher? That’s where linear regression enters the picture. The points in the picture above are obviously not on a line. But you can see that they’re not far off. You could probably draw a straight line freehand that cuts pretty much through the middle of this cloud of points. Linear regression takes the guesswork out, finding the line that comes closest* to passing through all the points. For the North Carolina colleges, it looks like the following figure.
The line in the picture has a slope of about 28. That means: if tuition were actually completely determined by SAT scores according to the line I drew on the chart, each extra point of SAT would correspond to an extra $28 in tuition. If you can raise the average SAT score of your incoming first-years by 50 points on average, you can charge $1,400 more in tuition. (Or, from the parent’s point of view, your kid improving 100 points is going to cost you an extra $2,800 a year. That test-prep course was more expensive than you thought!)
Linear regression is a marvelous tool, versatile, scalable, and as easy to execute as clicking a button on your spreadsheet. You can use it for data sets involving two variables, like the ones I’ve drawn here, but it works just as well for three variables, or a thousand. Whenever you want to understand which variables drive which other variables, and in which direction, it’s the first thing you reach for. And it works on any data set at all.
That’s a weakness as well as a strength. You can do linear regression without thinking about whether the phenomenon you’re modeling is actually close to linear. But you shouldn’t. I said linear regression was like a screwdriver, and that’s true; but in another sense, it’s more like a table saw. If you use it without paying careful attention to what you’re doing, the results can be gruesome.
Take, for instance, the missile we fired off in the last chapter. Perhaps you were not the one who fired the missile at all. Perhaps you are, instead, the missile’s intended recipient. As such, you have a keen interest in analyzing the missile’s path as accurately as possible.
Maybe you have plotted the vertical position of the missile at five points in time, and it looks like this:
Now you do a quick linear regression, and you get great results. There’s a line that passes almost exactly through the points you plotted:
(This is where your hand starts to creep, unthinkingly, toward the table saw’s keening blade.)
Your line gives a very precise model for the missile’s motion: for every minute that passes, the missile increases its altitude by some fixed amount: say, 400 meters. After an hour it’s 24 km above the earth’s surface. When does it come down? It never comes down! An upward sloping line just keeps on sloping upward. That’s what lines do.
(Blood, gristle, screams.)
Not every curve is a line. And the curve of a missile’s flight is most emphatically not a line; it’s a parabola. Just like Archimedes’s circle, it looks like a line close up; and that’s why the linear regression will do a great job telling you where the missile is five seconds after the last time you tracked it. But an hour later? Forget it. Your model says the missile is in the lower stratosphere, when, in fact, it is probably approaching your house.
The most vivid warning I know against thoughtless linear extrapolation was set down not by a statistician but by Mark Twain, in Life on the Mississippi:
The Mississippi between Cairo and New Orleans was twelve hundred and fifteen miles long one hundred and seventy-six years ago. It was eleven hundred and eighty after the cut-off of 1722. It was one thousand and forty after the American Bend cut-off. It has lost sixty-seven miles since. Consequently its length is only nine hundred and seventy-three miles at present. . . . In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. This is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.
ASIDE: HOW TO GET PARTIAL CREDIT ON MY CALCULUS EXAM
The methods of calculus are a lot like linear regression: they’re purely mechanical, your calculator can carry them out, and it is very dangerous to use them inattentively. On a calculus exam you might be asked to compute the weight of water left in a jug after you punch some kind of hole and let some kind of flow take place for some amount of time, blah blah blah. It’s easy to make arithmetic mistakes when doing a problem like this under time pressure. And sometimes that leads to a student arriving at a ridiculous result, like a jug of water whose weight is −4 grams.
If a student arrives at −4 grams and writes, in a desperate, hurried hand, “I screwed up somewhere, but I can’t find my mistake,” I give them half credit.
If they just write “−4g” at the bottom of the page and circle it, they get zero—even if the entire derivation was correct apart from a single misplaced digit somewhere halfway down the page.
Working an integral or performing a linear regression is something a computer can do quite effectively. Understanding whether the result makes sense—or deciding whether the method is the right one to use in the first place—requires a guiding human hand. When we teach mathematics we are supposed to be explaining how to be that guide. A math course that fails to do so is essentially training the student to be a very sl
ow, buggy version of Microsoft Excel.
And let’s be frank: that really is what many of our math courses are doing. To make a long, contentious story short (but still contentious), the teaching of mathematics to children has for decades now been the arena of the so-called math wars. On one side, you have teachers who favor an emphasis on memorization, fluency, traditional algorithms, and exact answers; on the other, teachers who think math teaching should be about learning meaning, developing ways of thinking, guided discovery, and approximation. Sometimes the first approach is called traditional and the second reform, although the supposedly nontraditional discovery approach has been around in some form for decades, and whether “reform” truly counts as a reform is exactly what’s up for debate. Fierce debate. At a math dinner party it’s okay to bring up politics or religion, but start an argument about math pedagogy and it’s likely to end with somebody storming out in either a traditionalist or reformist huff.
I don’t count myself in either camp. I can’t go along with those reformists who want to throw out memorization of the multiplication table. When doing any serious mathematical thinking, you’re going to have to multiply 6 by 8 sometimes, and if you have to reach for your calculator each time you do that, you’ll never achieve the kind of mental flow that actual thinking requires. You can’t write a sonnet if you have to look up the spelling of each word as you go.
How Not to Be Wrong : The Power of Mathematical Thinking (9780698163843) Page 5