Do these biases invalidate this survey methodology? Not necessarily. Almost every methodology has drawbacks, and bias of one form or another is often unavoidable. You should just be aware of all the potential issues in a study and consider them when drawing conclusions. For example, knowing about the survivorship bias in remaining employees, you could examine the data from exit interviews to see whether motivation issues were mentioned by departing employees. You could even try to survey them too.
A few other examples can further illustrate how subtle survivorship bias can be. In World War II, naval researchers conducted a study of damaged aircraft that returned from missions, so that they could make suggestions as to how to bolster aircraft defenses for future missions. Looking at where these planes had been hit, they concluded that areas where they had taken the most damage should receive extra armor.
However, statistician Abraham Wald noted that the study sampled only planes that had survived missions, and not the many planes that had been shot down. He therefore theorized the opposite conclusion, which turned out to be correct: that the areas with holes represented areas where aircraft could be shot and still return safely, whereas the areas without holes probably contained areas that, if hit, would cause the planes to go down.
Similarly, if you look at tech CEOs like Bill Gates and Mark Zuckerberg, you might conclude that dropping out of school to pursue your dreams is a fine idea. However, you’d be thinking only of the people that “survived.” You’re missing all the dropouts who did not make it to the top. Architecture presents a more everyday example: Old buildings generally seem to be more beautiful than their modern counterparts. Those buildings, though, are the ones that have survived the ages; there were slews of ugly ones from those time periods that have already been torn down.
Survivorship Bias
When you critically evaluate a study (or conduct one yourself), you need to ask yourself: Who is missing from the sample population? What could be making this sample population nonrandom relative to the underlying population? For example, if you want to grow your company’s customer base, you shouldn’t just sample existing customers; that sample doesn’t account for the probably much larger population of potential customers. This much larger potential customer base may behave very differently from your existing customer base (as is the case with early adopters versus the early majority, which we described in Chapter 4).
One more type of bias that can be inadvertently introduced is response bias. While nonresponse bias is introduced when certain types of people do not respond, for those who do respond, various cognitive biases can cause them to deviate from accurate or truthful responses. For example, in the employee engagement survey, people may lie (by omission or otherwise) for fear of reprisal.
In general, survey results can be influenced by response bias in a number of ways, including the following:
How questions are worded, e.g., leading or loaded questions
The order of questions, where earlier questions can influence later ones
Poor or inaccurate memory of respondents
Difficulty representing feelings in a number, such as one-to-ten ratings
Respondents reporting things that reflect well on themselves
It’s worth trying to account for all of these subtle biases (selection bias, nonresponse bias, response bias, survivorship bias), because after you do so, you can be even more sure of your conclusions.
BE WARY OF THE “LAW” OF SMALL NUMBERS
When you interpret data, you should watch out for a basic mistake that causes all sorts of trouble: overstating results from a sample that is too small. Even in a well-run experiment (like a political poll), you cannot expect to get a good estimate based on a small sample. This fallacy is sometimes referred to as the law of small numbers, and this section explores it in more detail. The name is derived from a valid statistical concept called the law of large numbers, which states that the larger the sample, the closer your average result is expected to be to the true average.
The figure below shows this in action. Each line represents a different series of coin flips and shows how the percentage of heads changes from the first to the five hundredth flip for each series. Note how the curves may deviate quite a bit from the 50 percent mark in the beginning, but start converging closer and closer toward 50 percent as the number of flips increases. But even out to five hundred flips, some of the values are still a fair bit away from 50 percent.
Law of Large Numbers
The speed of convergence for a given experiment depends on the situation. We will explain in a later section how you know when you have a large enough sample. For now, we want to focus on what can go wrong if your sample is too small.
First, consider the gambler’s fallacy, named after roulette players who believe that a streak of reds or blacks from a roulette wheel is more likely to end than to continue with the next spin. Suppose you see ten blacks in a row. Those who fall victim to this fallacy expect the next spin to have a higher chance of coming up red, when in fact the underlying probability of each spin hasn’t changed. For this fallacy to be true, there would have to be some kind of corrective force in the roulette wheel that is bringing the results closer to parity. That’s simply not the case.
It’s sometimes called the Monte Carlo fallacy because in a widely cited case in August 18, 1913, a casino in Monte Carlo had an improbable run of twenty-six blacks! There is only a 1 in 137 million chance of this happening in any twenty-six-ball sequence. However, all other twenty-six-spin sequences are equally rare; they just aren’t all as memorable.
The gambler’s fallacy applies anywhere there is a sequence of decisions, including those by judges, loan officers, and even baseball umpires. In a University of Chicago review of refugee asylum cases from 1985 to 2013, published in the Quarterly Journal of Economics as “Decision-Making Under the Gambler’s Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires,” judges were less likely to approve an asylum case if they had approved the last two. It also explains that uncomfortable feeling you might have gotten as a student when you saw that you had chosen answer B four times in a row on a multiple-choice test.
Random data often contains streaks and clusters. Are you surprised to learn that there is a 50 percent chance of getting a run of four heads in a row during any twenty-flip sequence? Streaks like this are often erroneously interpreted as evidence of nonrandom behavior, a failure of intuition called the clustering illusion.
Look at the pair of pictures on the next page. Which is randomly generated?
These pictures come from psychologist Steven Pinker’s book The Better Angels of Our Nature. The left picture—the one with the obvious clusters—is actually the one that is truly random. The right picture—the one that intuitively seems more random—is not; it is a depiction of the positions of glowworms on the ceiling of a cave in Waitomo, New Zealand. The glowworms intentionally space themselves apart from one another in the competition for food.
Clustering Illusion
In World War II, Londoners sought to find a pattern to the bombings of their city by the Germans. Some became convinced that certain areas were being targeted and others were being spared, leading to conspiracy theories about German sympathizers in certain neighborhoods that didn’t get hit. However, statistical analysis showed that there was no evidence to support claims that the bombings were nonrandom.
The improbable should not be confused with the impossible. If enough chances are taken, even rare events are expected to happen. Some people do win the lottery and some people do get struck by lightning. A one-in-a-million event happens quite frequently on a planet with seven billion people.
In the U.S., public health officials are asked to investigate more than one thousand suspected cancer clusters each year. While historically there have been notable cancer clusters caused by exposure to industrial toxins, the vast majority of the cases reported are due to random chance. There are more than 400,000 businesses with fifty
or more employees; that’s a lot of opportunities for a handful of people to receive the same unfortunate diagnosis.
Knowing the gambler’s fallacy, you shouldn’t always expect short-term results to match long-term expectations. The inverse is also true: you shouldn’t base long-term expectations on a small set of short-term results.
You might be familiar with the phrase sophomore slump, which describes scenarios such as when a band gets rave reviews for their first album and the second one isn’t as well received, or when a baseball player has a fantastic rookie season but the next year his batting average is not that impressive. In these situations, you may assume there must be some psychological explanation, such as caving under the pressure of success. But in most cases, the true cause is purely mathematical, explained through a model called regression to the mean.
Mean is just another word for average, and regression to the mean explains why extreme events are usually followed by something more typical, regressing closer to the expected mean. For instance, a runner is not expected to follow a record-breaking race with another record-breaking time; a slightly less impressive performance would be expected. That’s because a repeat of a rare result is equally as rare as its first occurrence, such that it shouldn’t be expected the next time.
The takeaway is that you should never assume that a result based on a small set of observations is typical. It may not be representative of either another small set of observations or a much larger set of observations. Like anecdotal evidence, a small sample tells you very little beyond that what happened was within the range of possible outcomes. While first impressions can be accurate, you should treat them with skepticism. More data will help you distinguish what is likely from what is an anomaly.
THE BELL CURVE
When you are dealing with a lot of data, you can use graphs and summary statistics to combat the feeling of information overload (see Chapter 2). The term statistics is actually just the name for numbers used to summarize a dataset. (It also refers to the mathematical process by which those numbers are generated.) Graphs and summary statistics succinctly communicate facts about the dataset.
You use summary statistics all the time without even realizing it. If someone asked you, “What is the temperature of a healthy person?” you’d likely say it was 98.6 degrees Fahrenheit or 37 degrees Celsius. That’s actually a summary statistic called the mean, which, as we just explained, is another word for average.
You probably don’t even remember when you first learned that fact, and it’s even more likely you have no idea where that number comes from. A nineteenth-century German physician, Dr. Carl Wunderlich, diligently collected and analyzed more than a million armpit temperatures from twenty-five thousand patients to calculate that statistic (yes, that’s a lot of armpits).
Yet 98.6 degrees Fahrenheit isn’t some magical temperature. First of all, more recent data indicates a lower mean, closer to 98.2 degrees. Second, you may have noticed from taking your own temperature or that of a family member that “normal” temperatures vary from this mean. In fact, women are slightly warmer than men on average, and temperatures of up to 99.9°F (37.7°C) are still considered normal. Third, people’s temperatures also naturally change throughout the day, moving up on average by 0.9°F (0.5°C) from morning to night.
Just saying a healthy temperature is 98.6°F doesn’t account for all of this nuance. That’s why a range of summary statistics and graphs are often used on a case-by-case basis to summarize data. The mean (average or expected value) measures central tendency, or where the values tend to be centered. Two other popular summary statistics that measure central tendency are the median (middle value that splits the data into two halves) and the mode (the most frequent result). These statistics help describe what a “typical” number might look like for a given set of data.
For body temperature, though, just reporting the central tendency, such as the mean, can at times be too simplistic. This brings us to the second common set of summary statistics, those that measure dispersion, or how far the data is spread out.
The simplest dispersion statistics report ranges. For body temperature, that could be specifying the range of values considered normal, e.g., minimum to maximum reported values from healthy people, as in the graph below (called a histogram).
Histogram
The graph on the previous page depicts the frequencies of 130 different body temperatures derived from a study of healthy adults. A histogram like this one is a simple way that you can summarize data visually: group the values into buckets, count how many data points are in each bucket, and make a vertical bar graph of the buckets.
Before reporting a range, you might first look for outliers, those data points that don’t seem to fit with the rest of the data. These are the data points set apart in the histogram, such as the one at 100.8°F. Perhaps a sick person sneaked into the dataset. As a result, you might report a normal temperature range of 96.3°F to 100.0°F. Of course, with more data, you could produce a more accurate range.
In this dataset, central tendency statistics are quite similar because the distribution of the data is fairly symmetric, with just one peak in the middle. As a result, the mean is 98.25°F, the median is 98.3°F, and the mode is 98°F. In other scenarios, though, these three summary statistics may be quite different.
To illustrate this, consider another histogram, below, showing the distribution of U.S. household income in 2016. This dataset also has one peak, at $20,000–$24,999, but it is asymmetric, skewing to the right. (All incomes above $200,000 are grouped into one bar; had this not been the case, the graph would have a long tail stretching much farther to the right.)
Unlike for the body temperatures, the median income of $59,039 is very different from the mean income of $83,143. Whenever the data is skewed in one direction like this, the mean gets pulled away from the median and toward the skew, swayed by the extreme values.
Distribution of U.S. Household Income (2016)
Also, a minimum–maximum range is less informative here. A better summary of the dispersion in this case might be an interquartile range specifying the 25th percentile to the 75th percentile of the data, which captures the middle 50 percent of incomes, from $27,300 to $102,350.
The most common statistical measures of dispersion, though, are the variance and the standard deviation (the latter usually denoted by the Greek letter σ, sigma). They are both measures of how far the numbers in a dataset tend to vary from its mean. The following figure shows how you calculate them for a set of data.
Variance & Standard Deviation
Number of observations: n = 5
Observations: 5, 10, 15, 20, 25
Sample mean: (5+10+15+20+25)/5 = 75/5 = 15
Data point deviations from sample mean, squared:
(5-15)2 = (-10)2 = 100
(10-15)2 = (-5)2 = 25
(15-15)2 = (0)2 = 0
(20-15)2 = (5)2 = 25
(25-15)2 = (10)2 = 100
Sample variance: (100+25+0+25+100)/(n-1) = 250/(5-1) = 250/4 = 62.5
Sample standard deviation (δ): √(variance) = √(62.5) = 7.9
Because the standard deviation is just the square root of the variance, if you know one, then you can easily calculate the other. Higher values for each indicate that it is more common to see data points further from the mean, as shown in the targets below.
Variance
Low Variance
High Variance
The body temperature dataset depicted earlier has a standard deviation of 0.73°F. Slightly more than two-thirds of its values fall within one standard deviation from its mean (97.52°F to 98.98°F) and 95 percent within two standard deviations (96.79°F to 99.71°F). As you’ll see, this pattern is commonplace for many datasets consisting of measurements (e.g., heights, blood pressure, standardized tests).
Histograms of these types of datasets have similar bell-curve shapes with a cluster of values in the middle, close to the mean, and fewer and fewer results as you go further a
way from the mean. When a set of data has this type of shape, it is often suggested that it comes from a normal distribution.
The normal distribution is a special type of probability distribution, a mathematical function that describes how the probabilities for all possible outcomes of a random phenomenon are distributed. For example, if you take a random person’s temperature, getting any particular temperature has a certain probability, with the mean of 98.2°F being the most probable and values further away being less and less probable. Given that a probability distribution describes all the possible outcomes, all probabilities in a given distribution add up to 100 percent (or 1).
To understand this better, let’s consider another example. As mentioned above, people’s heights also roughly follow a normal distribution. Below is a graphical representation of the distribution of men’s and women’s heights based on data from the U.S. Centers for Disease Control and Prevention. The distributions both have the typical bell-curve shape, even though the men’s and women’s heights have different means.
Normal Distribution
In normal distributions like these (and as we saw with the body temperatures), approximately 68 percent of all values should fall within one standard deviation of the mean, about 95 percent within two, and nearly all (99.7 percent) within three. In this manner, a normal distribution can be uniquely described by just its mean and standard deviation. Because so many phenomena can be described by the normal distribution, knowing these facts is particularly useful.
Super Thinking Page 17