The Black Swan
Page 33
The quincunx (its name is derived from the Latin for five) in the pinball example shows the fifth round, with thirty-two possibilities, easy to track. Such was the concept behind the quincunx used by Francis Galton. Galton was both insufficiently lazy and a bit too innocent of mathematics; instead of building the contraption, he could have worked with simpler algebra, or perhaps undertaken a thought experiment like this one.
Let’s keep playing. Continue until you have forty flips. You can perform them in minutes, but we will need a calculator to work out the number of outcomes, which are taxing to our simple thought method. You will have about 1,099,511,627,776 possible combinations—more than one thousand billion. Don’t bother doing the calculation manually, it is two multiplied by itself forty times, since each branch doubles at every juncture. (Recall that we added a win and a lose at the end of the alternatives of the third round to go to the fourth round, thus doubling the number of alternatives.) Of these combinations, only one will be up forty, and only one will be down forty. The rest will hover around the middle, here zero.
We can already see that in this type of randomness extremes are exceedingly rare. One in 1,099,511,627,776 is up forty out of forty tosses. If you perform the exercise of forty flips once per hour, the odds of getting 40 ups in a row are so small that it would take quite a bit of forty-flip trials to see it. Assuming you take a few breaks to eat, argue with your friends and roommates, have a beer, and sleep, you can expect to wait close to four million lifetimes to get a 40-up outcome (or a 40-down outcome) just once. And consider the following. Assume you play one additional round, for a total of 41; to get 41 straight heads would take eight million lifetimes! Going from 40 to 41 halves the odds. This is a key attribute of the nonscalable framework to analyzing randomness: extreme deviations decrease at an increasing rate. You can expect to toss 50 heads in a row once in four billion lifetimes!
FIGURE 9: NUMBERS OF WINS TOSSED
Result of forty tosses. We see the proto-bell curve emerging.
We are not yet fully in a Gaussian bell curve, but we are getting dangerously close. This is still proto-Gaussian, but you can see the gist. (Actually, you will never encounter a Gaussian in its purity since it is a Platonic form—you just get closer but cannot attain it.) However, as you can see in Figure 9, the familiar bell shape is starting to emerge.
How do we get even closer to the perfect Gaussian bell curve? By refining the flipping process. We can either flip 40 times for $1 a flip or 4,000 times for ten cents a flip, and add up the results. Your expected risk is about the same in both situations—and that is a trick. The equivalence in the two sets of flips has a little nonintuitive hitch. We multiplied the number of bets by 100, but divided the bet size by 10—don’t look for a reason now, just assume that they are “equivalent.” The overall risk is equivalent, but now we have opened up the possibility of winning or losing 400 times in a row. The odds are about one in 1 with 120 zeroes after it, that is, one in 1,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000 times.
Continue the process for a while. We go from 40 tosses for $1 each to 4,000 tosses for 10 cents, to 400,000 tosses for 1 cent, getting close and closer to a Gaussian. Figure 10 shows results spread between −40 and 40, namely eighty plot points. The next one would bring that up to 8,000 points.
FIGURE 10: A MORE ABSTRACT VERSION: PLATO’S CURVE
An infinite number of tosses.
Let’s keep going. We can flip 4,000 times staking a tenth of a penny. How about 400,000 times at 1/1000 of a penny? As a Platonic form, the pure Gaussian curve is principally what happens when he have an infinity of tosses per round, with each bet infinitesimally small. Do not bother trying to visualize the results, or even make sense out of them. We can no longer talk about an “infinitesimal” bet size (since we have an infinity of these, and we are in what mathematicians call a continuous framework). The good news is that there is a substitute.
We have moved from a simple bet to something completely abstract. We have moved from observations into the realm of mathematics. In mathematics things have a purity to them.
Now, something completely abstract is not supposed to exist, so please do not even make an attempt to understand Figure 10. Just be aware of its use. Think of it as a thermometer: you are not supposed to understand what the temperature means in order to talk about it. You just need to know the correspondence between temperature and comfort (or some other empirical consideration). Sixty degrees corresponds to pleasant weather; ten below is not something to look forward to. You don’t necessarily care about the actual speed of the collisions among particles that more technically explains temperature. Degrees are, in a way, a means for your mind to translate some external phenomena into a number. Likewise, the Gaussian bell curve is set so that 68.2 percent of the observations fall between minus one and plus one standard deviations away from the average. I repeat: do not even try to understand whether standard deviation is average deviation—it is not, and a large (too large) number of people using the word standard deviation do not understand this point. Standard deviation is just a number that you scale things to, a matter of mere correspondence if phenomena were Gaussian.
These standard deviations are often nicknamed “sigma.” People also talk about “variance” (same thing: variance is the square of the sigma, i.e., of the standard deviation).
Note the symmetry in the curve. You get the same results whether the sigma is positive or negative. The odds of falling below −4 sigmas are the same as those of exceeding 4 sigmas, here 1 in 32,000 times.
As the reader can see, the main point of the Gaussian bell curve is, as I have been saying, that most observations hover around the mediocre, the mean, while the odds of a deviation decline faster and faster (exponentially) as you move away from the mean. If you need to retain one single piece of information, just remember this dramatic speed of decrease in the odds as you move away from the average. Outliers are increasingly unlikely. You can safely ignore them.
This property also generates the supreme law of Mediocristan: given the paucity of large deviations, their contribution to the total will be vanishingly small.
In the height example earlier in this chapter, I used units of deviations of ten centimeters, showing how the incidence declined as the height increased. These were one sigma deviations; the height table also provides an example of the operation of “scaling to a sigma” by using the sigma as a unit of measurement.
Those Comforting Assumptions
Note the central assumptions we made in the coin-flip game that led to the proto-Gaussian, or mild randomness.
First central assumption: the flips are independent of one another. The coin has no memory. The fact that you got heads or tails on the previous flip does not change the odds of your getting heads or tails on the next one. You do not become a “better” coin flipper over time. If you introduce memory, or skills in flipping, the entire Gaussian business becomes shaky.
Recall our discussions in Chapter 14 on preferential attachment and cumulative advantage. Both theories assert that winning today makes you more likely to win in the future. Therefore, probabilities are dependent on history, and the first central assumption leading to the Gaussian bell curve fails in reality. In games, of course, past winnings are not supposed to translate into an increased probability of future gains—but not so in real life, which is why I worry about teaching probability from games. But when winning leads to more winning, you are far more likely to see forty wins in a row than with a proto-Gaussian.
Second central assumption: no “wild” jump. The step size in the building block of the basic random walk is always known, namely one step. There is no uncertainty as to the size of the step. We did not encounter situations in which the move varied wildly.
Remember that if either of these two central assumptions is not met, your moves (or coin tosses) will not cumulatively lead to th
e bell curve. Depending on what happens, they can lead to the wild Mandelbrotian-style scale-invariant randomness.
“The Ubiquity of the Gaussian”
One of the problems I face in life is that whenever I tell people that the Gaussian bell curve is not ubiquitous in real life, only in the minds of statisticians, they require me to “prove it”—which is easy to do, as we will see in the next two chapters, yet nobody has managed to prove the opposite. Whenever I suggest a process that is not Gaussian, I am asked to justify my suggestion and to, beyond the phenomena, “give them the theory behind it.” We saw in Chapter 14 the rich-get-richer models that were proposed in order to justify not using a Gaussian. Modelers were forced to spend their time writing theories on possible models that generate the scalable—as if they needed to be apologetic about it. Theory shmeory! I have an epistemological problem with that, with the need to justify the world’s failure to resemble an idealized model that someone blind to reality has managed to promote.
My technique, instead of studying the possible models generating non-bell curve randomness, hence making the same errors of blind theorizing, is to do the opposite: to know the bell curve as intimately as I can and identify where it can and cannot hold. I know where Mediocristan is. To me it is frequently (nay, almost always) the users of the bell curve who do not understand it well, and have to justify it, and not the opposite.
This ubiquity of the Gaussian is not a property of the world, but a problem in our minds, stemming from the way we look at it.
• • •
The next chapter will address the scale invariance of nature and address the properties of the fractal. The chapter after that will probe the misuse of the Gaussian in socioeconomic life and “the need to produce theories.”
I sometimes get a little emotional because I’ve spent a large part of my life thinking about this problem. Since I started thinking about it, and conducting a variety of thought experiments as I have above, I have not for the life of me been able to find anyone around me in the business and statistical world who was intellectually consistent in that he both accepted the Black Swan and rejected the Gaussian and Gaussian tools. Many people accepted my Black Swan idea but could not take it to its logical conclusion, which is that you cannot use one single measure for randomness called standard deviation (and call it “risk”); you cannot expect a simple answer to characterize uncertainty. To go the extra step requires courage, commitment, an ability to connect the dots, a desire to understand randomness fully. It also means not accepting other people’s wisdom as gospel. Then I started finding physicists who had rejected the Gaussian tools but fell for another sin: gullibility about precise predictive models, mostly elaborations around the preferential attachment of Chapter 14—another form of Platonicity. I could not find anyone with depth and scientific technique who looked at the world of randomness and understood its nature, who looked at calculations as an aid, not a principal aim. It took me close to a decade and a half to find that thinker, the man who made many swans gray: Mandelbrot—the great Benoît Mandelbrot.
* The nontechnical (or intuitive) reader can skip this chapter, as it goes into some details about the bell curve. Also, you can skip it if you belong to the category of fortunate people who do not know about the bell curve.
* I have fudged the numbers a bit for simplicity’s sake.
* One of the most misunderstood aspects of a Gaussian is its fragility and vulnerability in the estimation of tail events. The odds of a 4 sigma move are twice that of a 4.15 sigma. The odds of a 20 sigma are a trillion times higher than those of a 21 sigma! It means that a small measurement error of the sigma will lead to a massive underestimation of the probability. We can be a trillion times wrong about some events.
† My main point, which I repeat in some form or another throughout Part Three, is as follows. Everything is made easy, conceptually, when you consider that there are two, and only two, possible paradigms: nonscalable (like the Gaussian) and other (such as Mandebrotian randomness). The rejection of the application of the nonscalable is sufficient, as we will see later, to eliminate a certain vision of the world. This is like negative empiricism: I know a lot by determining what is wrong.
* Note that variables may not be infinitely scalable; there could be a very, very remote upper limit—but we do not know where it is so we treat a given situation as if it were infinitely scalable. Technically, you cannot sell more of one book than there are denizens of the planet—but that upper limit is large enough to be treated as if it didn’t exist. Furthermore, who knows, by repackaging the book, you might be able to sell it to a person twice, or get that person to watch the same movie several times.
† As I was revising this draft, in August 2006, I stayed at a hotel in Dedham, Massachusetts, near one of my children’s summer camps. There, I was a little intrigued by the abundance of weight-challenged people walking around the lobby and causing problems with elevator backups. It turned out that the annual convention of NAFA, the National Association for Fat Acceptance, was being held there. As most of the members were extremely overweight, I was not able to figure out which delegate was the heaviest: some form of equality prevailed among the very heavy (someone much heavier than the persons I saw would have been dead). I am sure that at the NARA convention, the National Association for Rich Acceptance, one person would dwarf the others, and, even among the superrich, a very small percentage would represent a large section of the total wealth.
Chapter Sixteen
THE AESTHETICS OF RANDOMNESS
Mandelbrot’s library—Was Galileo blind?—Pearls to swine—Self-affinity—How the world can be complicated in a simple way, or, perhaps, simple in a very complicated way
THE POET OF RANDOMNESS
It was a melancholic afternoon when I smelled the old books in Benoît Mandelbrot’s library. This was on a hot day in August 2005, and the heat exacerbated the musty odor of the glue of old French books bringing on powerful olfactory nostalgia. I usually succeed in repressing such nostalgic excursions, but not when they sneak up on me as music or smell. The odor of Mandelbrot’s books was that of French literature, of my parents’ library, of the hours spent in bookstores and libraries when I was a teenager when many books around me were (alas) in French, when I thought that Literature was above anything and everything. (I haven’t been in contact with many French books since my teenage days.) However abstract I wanted it to be, Literature had a physical embodiment, it had a smell, and this was it.
The afternoon was also gloomy because Mandelbrot was moving away, exactly when I had become entitled to call him at crazy hours just because I had a question, such as why people didn’t realize that the 80/20 could be 50/01. Mandelbrot had decided to move to the Boston area, not to retire, but to work for a research center sponsored by a national laboratory. Since he was moving to an apartment in Cambridge, and leaving his oversize house in the Westchester suburbs of New York, he had invited me to come take my pick of his books.
Even the titles of the books had a nostalgic ring. I filled up a box with French titles, such as a 1949 copy of Henri Bergson’s Matière et mémoire, which it seemed Mandelbrot bought when he was a student (the smell!).
After having mentioned his name left and right throughout this book, I will finally introduce Mandelbrot, principally as the first person with an academic title with whom I ever spoke about randomness without feeling defrauded. Other mathematicians of probability would throw at me theorems with Russian names such as “Sobolev,” “Kolmogorov,” Wiener measure, without which they were lost; they had a hard time getting to the heart of the subject or exiting their little box long enough to consider its empirical flaws. With Mandelbrot, it was different: it was as if we both originated from the same country, meeting after years of frustrating exile, and were finally able to speak in our mother tongue without straining. He is the only flesh-and-bones teacher I ever had—my teachers are usually books in my library. I had way too little respect for mathematicians dealing with unc
ertainty and statistics to consider any of them my teachers—in my mind mathematicians, trained for certainties, had no business dealing with randomness. Mandelbrot proved me wrong.
He speaks an unusually precise and formal French, much like that spoken by Levantines of my parents’ generation or Old World aristocrats. This made it odd to hear, on occasion, his accented, but very standard, colloquial American English. He is tall, overweight, which makes him look baby-faced (although I’ve never seen him eat a large meal), and has a strong physical presence.
From the outside one would think that what Mandelbrot and I have in common is wild uncertainty, Black Swans, and dull (and sometimes less dull) statistical notions. But, although we are collaborators, this is not what our major conversations revolve around. It is mostly matters literary and aesthetic, or historical gossip about people of extraordinary intellectual refinement. I mean refinement, not achievement. Mandelbrot could tell stories about the phenomenal array of hotshots he has worked with over the past century, but somehow I am programmed to consider scientists’ personae far less interesting than those of colorful erudites. Like me, Mandelbrot takes an interest in urbane individuals who combine traits generally thought not to coexist together. One person he often mentions is Baron Pierre Jean de Menasce, whom he met at Princeton in the 1950s, where de Menasce was the roommate of the physicist Oppenheimer. De Menasce was exactly the kind of person I am interested in, the embodiment of a Black Swan. He came from an opulent Alexandrian Jewish merchant family, French and Italian–speaking like all sophisticated Levantines. His forebears had taken a Venetian spelling for their Arabic name, added a Hungarian noble title along the way, and socialized with royalty. De Menasce not only converted to Christianity, but became a Dominican priest and a great scholar of Semitic and Persian languages. Mandelbrot kept questioning me about Alexandria, since he was always looking for such characters.