The Black Swan
Page 44
Another deficiency I need to point out concerns a strangely unrealistic and unrigorous research tradition in social science, “rational expectations,” in which observers are shown to rationally converge on the same inference when supplied with the same data, even if their initial hypotheses were markedly different (by a mechanism of updating called Bayesian inference). Why unrigorous? Because one needs a very quick check to see that people do not converge to the same opinions in reality. This is partly, as we saw in Chapter 6, because of psychological distortions such as the confirmation bias, which cause divergent interpretation of the data. But there is a mathematical reason why people do not converge to the same opinion: if you are using a probability distribution from Extremistan, and I am using a distribution from Mediocristan (or a different one from Extremistan), then we will never converge, simply because if you suppose Extremistan you do not update (or change your mind) that quickly. For instance, if you assume Mediocristan and do not witness Black Swans, you may eventually rule them out. Not if you assume we are in Extremistan.
To conclude, assuming that “randomness” is not epistemic and subjective, or making a big deal about the distinction between “ontological randomness” and “epistemic randomness,” implies some scientific autism, that desire to systematize, and a fundamental lack of understanding of randomness itself. It assumes that an observer can reach omniscience and can compute odds with perfect realism and without violating consistency rules. What is left becomes “randomness,” or something by another name that arises from aleatory forces that cannot be reduced by knowledge and analysis.
There is an angle worth exploring: why on earth do adults accept these Soviet-Harvard-style top-down methods without laughing, and actually go to build policies in Washington based on them, against the record, except perhaps to make readers of history laugh at them and diagnose new psychiatric conditions? And, likewise, why do we default to the assumption that events are experienced by people in the same manner? Why did we ever take notions of “objective” probability seriously?
After this foray into the psychology of the perception of the dynamics of time and events, let us move to our central point, the very core of our program, into what I have aggressively called the most useful problem in philosophy. The most useful, sadly.
* Robert Merton, the villain of Chapter 17, a person said to be of a highly mechanistic mind (down to his interest in machinery and his use of mechanical metaphors to represent uncertainty), seems to have been created for the sole purpose of providing an illustration of dangerous Black Swan foolishness. After the crisis of 2008, he defended the risk taking caused by economists, giving the argument that “it was a Black Swan” simply because he did not see it coming, therefore, he said, the theories were fine. He did not make the leap that, since we do not see these events coming, we need to be robust to them. Normally, such people exit the gene pool; academic tenure holds them a bit longer.
* The argument can actually be used to satisfy moral hazard and dishonest (probabilistically disguised) profiteering. Rubin had pocketed more than $100 million from Citigroup’s earning of profits from hidden risks that blow up only occasionally. After he blew up, he had an excuse—“It never happened before.” He kept his money; we, the taxpayers, who include schoolteachers and hairdressers, had to bail the company out and pay for the losses. This I call the moral hazard element in paying bonuses to people who are not robust to Black Swans, and who we knew beforehand were not robust to the Black Swan. This beforehand is what makes me angry.
* It is indeed the absence of higher order representation—the inability to accept statements like “Is my method for assessing what is right or wrong right or wrong?”—that, we will see in the next section, is central when we deal with probability, that causes Dr. Johns to be suckers for measures and believe in them without doubting their beliefs. They fail to understand the metaprobability, the higher order probability—that is, the probability that the probability they are using may not be True.
† The nontechnical reader should skip the rest of this section.
V
(PERHAPS) THE MOST USEFUL PROBLEM IN THE HISTORY OF MODERN PHILOSOPHY
Small may not be the idea, after all—Where to find the powder room—Predict and perish—On school buses and intelligent textbooks
I am going to be blunt. Before The Black Swan (and associated papers) most of epistemology and decision theory was, to an actor in the real world, just sterile mind games and foreplay. Almost all the history of thought is about what we know, or think we know. The Black Swan is the very first attempt (that I know of) in the history of thought to provide a map of where we get hurt by what we don’t know, to set systematic limits to the fragility of knowledge—and to provide exact locations where these maps no longer work.
To answer the most common “criticism” by economists and (now bankrupt) bankers I mentioned in Section III, I am not saying “S**t happens,” I am saying “S**t happens in the Fourth Quadrant,” which is as different as mistaking prudence and caution for paranoia.
Furthermore, to be more aggressive, while limits like those attributed to Gödel bear massive philosophical consequences, but we can’t do much about them, I believe that the limits to empirical and statistical knowledge I have shown have sensible (if not vital) importance and we can do a lot with them in terms of solutions, by categorizing decisions based on the severity of the potential estimation error of the pair probability times consequence. For instance, we can use it to build a safer society—to robustify what lies in the Fourth Quadrant.
LIVING IN TWO DIMENSIONS
A vexing problem in the history of human thought is finding one’s position on the boundary between skepticism and gullibility, or how to believe and how to not believe. And how to make decisions based on these beliefs, since beliefs without decisions are just sterile. So this is not an epistemological problem (i.e., focusing on what is true or false); it is one of decision, action, and commitment.
Clearly, you cannot doubt everything and function; you cannot believe everything and survive. Yet the philosophical treatment of the problem has been highly incomplete, and, worse, has not improved much over the centuries, if it has improved at all. One class of thinkers, say the Cartesians, or the academic skeptics some eighteen centuries before them, in their own way, started with the rejection of everything upfront, with some even more radical, such as the Pyrrhonians, rejecting so much that they even reject skepticism as too dogmatic. The other class, say the medieval Scholastics or the modern-day pragmatists, starts with the fixation of beliefs, or some beliefs. While the medieval thinkers stop there, in an Aristotelian way, the early pragmatists, with the great thinker Charles Sanders Peirce, provided a ray of hope. They proposed to update and correct beliefs as a continuous work in progress (albeit under a known structure of probability, as Peirce believed in the existence and attainability of an ergodic, long-run, reachable state of convergence to truth). That brand of pragmatism (initially called pragmaticism) viewed knowledge as a rigorous interplay between anti-skepticism and fallibilism, i.e., between the two categories of what to doubt and what to accept. The application to my field, probability, and perhaps the most sophisticated version of the program, lies in the dense, difficult, deep, and brilliant forays of Isaac Levi into decision theory with the notion of corpus of belief, doxastic commitment, distance from expectation, and credal probabilities.
A ray of hope, perhaps, but still not even close. Not even remotely close to anything useful.
Think of living in a three-dimensional space while under the illusion of being in two dimensions. It may work well if you are a worm, certainly not if you happen to be a bird. Of course, you will not be aware of the truncation—and will be confronted with many mysteries, mysteries that you cannot possibly clear up without adding a dimension, no matter how sophisticated you may get. And, of course, you will feel helpless at times. Such was the fate of knowledge all these centuries, when it was locked in two dimensions too simpli
stic to be of any use outside of classrooms. Since Plato only philosophers have spent time discussing what Truth was, and for a reason: it is unusable in practice. By focusing on the True/False distinction, epistemology remained, with very few exceptions, prisoner of an inconsequential, and highly incomplete, 2-D framework. The third missing dimension is, of course, the consequence of the True, and the severity of the False, the expectation. In other words, the payoff from decisions, the impact and magnitude of the result of such a decision. Sometimes one may be wrong and the mistake may turn out to be inconsequential. Or one may be right, say, on such a subject as the sex of angels, and it may turn out to be of no use beyond intellectual stamp collecting.
The simplified, philistinified, academified, and glorified notion of “evidence” becomes useless. With respect to Black Swans, you act to protect yourself from negative ones (or expose yourself to positive ones) even though you may have no evidence that they can take place, just as we check people for weapons before they board a plane even though we have no evidence that they are terrorists. This focus on off-the-shelf commoditized notions such as “evidence,” is a problem with people who claim to use “rigor” yet go bust on occasion.
A probabilistic world has trouble with “proof” as it is, but in a Black Swan world things are a lot worse.
Indeed, I know of almost no decision that is based on notions of True/False.
Once you start examining the payoff, the result of decisions, you will see clearly that the consequences of some errors may be benign, those of others may be severe. And you pretty much know which is which beforehand. You know which errors are consequential and which ones are not so much.
But first let us look at a severe problem in the derivation of knowledge about probabilities.
THE DEPENDENCE ON THEORY FOR RARE EVENTS
During my deserto period, when I was getting severe but entertaining insults, I found myself debating a gentleman then employed by a firm called Lehman Brothers. That gentleman had made a statement in The Wall Street Journal saying that events we saw in August 2007 should have happened once every ten thousand years. Sure enough, we had three such events three days in a row. The Wall Street Journal ran his picture and if you look at it, you can safely say, “He does not look ten thousand years old.” So where is he getting his “once in ten thousand years” probability? Certainly not from personal experience; certainly not from the records of Lehman Brothers—his firm had not been around for ten thousand years, and of course it didn’t stay around for another ten thousand years, as it went under right after our debate. So, you know that he’s getting his small probabilities from a theory. The more remote the event, the less we can get empirical data (assuming generously that the future will resemble the past) and the more we need to rely on theory.
Consider that the frequency of rare events cannot be estimated from empirical observation for the very reason that they are rare. We thus need a prior model representation for that; the rarer the event, the higher the error in estimation from standard inductive methods (say, frequency sampling from counting past occurrences), hence the higher the dependence on an a priori representation that extrapolates into the space of low-probability events (which necessarily are not seen often).*
But even outside of small probabilities, the a priori problem is always present. It seems salient with respect to rare events, but it pervades probabilistic knowledge. I will present two versions I have been working on with two collaborators, Avital Pilpel, a philosopher of science (he walks fast), and Raphael Douady, a mathematician (he is sometimes a good walker, when he is not busy).
Epimenides the Cretan
Avital Pilpel and I expressed the regress argument as follows, as the epistemic problem of risk management, but the argument can be generalized to any form of probabilistic knowledge. It is a problem of self-reference by probability measures.
We can state it in the following way. If we need data to obtain a probability distribution to gauge knowledge about the future behavior of the distribution from its past results, and if, at the same time, we need a probability distribution to gauge data sufficiency and whether or not it is predictive of the future, then we face a severe regress loop. This is a problem of self-reference akin to that of Epimenides the Cretan stating whether or not Cretans are liars. Indeed, it is too uncomfortably close to the Epimenides situation, since a probability distribution is used to assess the degree of truth but cannot reflect on its own degree of truth and validity. And, unlike many problems of self-reference, those related to risk assessment have severe consequences. The problem is more acute with small probabilities.
An Undecidability Theorem
This problem of self-reference, published with Pilpel after The Black Swan, went unnoticed as such. So Raphael Douady and I re-expressed the philosophical problem mathematically, and it appears vastly more devastating in its practical implications than the Gödel problem.
Raphael is, among the people I know, perhaps the man with the greatest mathematical erudition—he may have more mathematical culture than anyone in modern times, except perhaps for his late father, Adrien Douady.
At the time of writing, we may have produced a formal proof using mathematics, and a branch of mathematics called “measure theory” that was used by the French to put rigor behind the mathematics of probability. The paper is provisionally called “Undecidability: On the inconsistency of estimating probabilities from a sample without binding a priori assumptions on the class of acceptable probabilities.”
It’s the Consequences …
Further, we in real life do not care about simple, raw probability (whether an event happens or does not happen); we worry about consequences (the size of the event; how much total destruction of lives or wealth, or other losses, will come from it; how much benefit a beneficial event will bring). Given that the less frequent the event, the more severe the consequences (just consider that the hundred-year flood is more severe, and less frequent, than the ten-year flood; the bestseller of the decade ships more copies than the bestseller of the year), our estimation of the contribution of the rare event is going to be massively faulty (contribution is probability times effect; multiply that by estimation error); and nothing can remedy it.*
So the rarer the event, the less we know about its role—and the more we need to compensate for that deficiency with an extrapolative, generalizing theory. It will lack in rigor in proportion to claims about the rarity of the event. Hence theoretical and model error are more consequential in the tails; and, the good news, some representations are more fragile than others.
I showed that this error is more severe in Extremistan, where rare events are more consequential, because of a lack of scale, or a lack of asymptotic ceiling for the random variable. In Mediocristan, by comparison, the collective effect of regular events dominates and the exceptions are rather inconsequential—we know their effect, and it is very mild because one can diversify thanks to the “law of large numbers.” Let me provide once again an illustration of Extremistan. Less than 0.25 percent of all the companies listed in the world represent around half the market capitalization, a less than minuscule percentage of novels on the planet accounts for approximately half of fiction sales, less than 0.1 percent of drugs generate a little more than half the pharmaceutical industry’s sales—and less than 0.1 percent of risky events will cause at least half the damages and losses.
From Reality to Representation*
Let me take another angle. The passage from theory to the real world presents two distinct difficulties: inverse problems and pre-asymptotics.
Inverse Problems. Recall how much more difficult it is to re-create an ice cube from the results of the puddle (reverse engineering) than to forecast the shape of the puddle. In fact, the solution is not unique: the ice cube can be of very many shapes. I have discovered that the Soviet-Harvard method of viewing the world (as opposed to the Fat Tony style) makes us commit the error of confusing the two arrows (from ice cube to puddle; from pud
dle to ice cube). It is another manifestation of the error of Platonicity, of thinking that the Platonic form you have in your mind is the one you are observing outside the window. We see a lot of evidence of confusion of the two arrows in the history of medicine, the rationalistic medicine based on Aristotelian teleology, which I discussed earlier. This confusion is based on the following rationale. We assume that we know the logic behind an organ, what it was made to do, and thus that we can use this logic in our treatment of the patient. It has been very hard in medicine to shed our theories of the human body. Likewise, it is easy to construct a theory in your mind, or pick it up from Harvard, then go project it on the world. Then things are very simple.
This problem of confusion of the two arrows is very severe with probability, particularly with small probabilities.*
As we showed with the undecidability theorem and the self-reference argument, in real life we do not observe probability distributions. We just observe events. So I can rephrase the results as follows: we do not know the statistical properties—until, of course, after the fact. Given a set of observations, plenty of statistical distributions can correspond to the exact same realizations—each would extrapolate differently outside the set of events from which it was derived. The inverse problem is more acute when more theories, more distributions can fit a set of data, particularly in the presence of nonlinearities or nonparsimonious distributions.† Under nonlinearities, the families of possible models/parametrization explode in numbers.‡