For any issue, you can find people on both sides with “numbers” to back up their position. This leads many people to feel that data can be too easily manipulated to support whatever story someone wants to tell, hence the title of this chapter. Similarly, even if people aren’t intentionally trying to mislead you, study results are often accidentally misinterpreted, or the studies themselves can suffer from design flaws.
However, the answer is not to dismiss all statistics or data-driven evidence as nonsense, leaving you to base decisions solely on opinions and guesses. Instead, you must use mental models to a get a deeper understanding of an issue, including its underlying research, enabling you to determine what information is credible.
You can also use data from your life and business to derive new insights. Insights based on true patterns, such as those found in market trends, customer behavior, and natural occurrences, can form the basis for major companies and scientific breakthroughs. They can also provide insight in everyday life.
As an example, consider being a first-time parent. Lucky parents have a baby who goes to sleep easily and sleeps through the night at one month old. The rest of us have to hear all the advice: use a rocker, swaddle them, let them cry it out, don’t let them cry it out, co-sleep, change the baby’s diet, change the mother’s diet, and on and on.
Our older son never wanted to be put down, but our pediatrician nevertheless advised us to put him down when he was sleepy but still awake. That always led to him screaming the minute he was set down. If he wasn’t deeply asleep, he would just rouse himself and start crying. The first few nights of this were harrowing, with each of us taking turns staying awake and holding him while he slept; he may have slept on his own for an hour a night.
We had to find another way. Through experimentation and collecting our own data over the first few weeks (see scientific method in Chapter 4), we discovered that our son liked a tight swaddle and would fall asleep in an electric swing, preferably on the highest setting. When he grew out of the swaddle, we feared that we were going back to square one. Luckily, he quickly adapted, and before he turned one, he could easily be put down and sleep straight through the night.
When we had our second son, we thought of ourselves as baby-care professionals. We had our magic swing and we thought we were all set. And then, per Murphy’s law (see Chapter 2), baby number two hated the swing. We circled back through all the advice, and after a few days, we tried to set him down when he was sleepy but awake (per our pediatrician’s original advice). Lo and behold, he put himself to sleep!
Like babies and their sleep procedures, many aspects of life have inherent variability and cannot be predicted with certainty. Will it rain today? Which funds should you invest your retirement money in? Who are the best players to draft for your fantasy football team?
Despite this uncertainty, you still have to make a lot of choices, from decisions about your health to deciding whom to vote for to taking a risk with a new project at work. This chapter is about helping you think about wading through such uncertainty in the context of decision making. What advice should you listen to and why?
Probability and statistics are the branches of mathematics that give us the most useful mental models for these tasks. As French mathematician Pierre-Simon Laplace wrote in his 1812 book Théorie Analytique des Probabilités: “The most important questions of life are indeed, for the most part, really only problems of probability.”
We will discuss the useful mental models from the fields of probability and statistics along with common traps to avoid. While many of the basic concepts of probability are fairly intuitive, your intuition often fails you (as we’ve seen throughout this book).
Yes, that means some of this chapter is a bit mathematical. However, we believe that an understanding of these concepts is needed for you to understand the statistical claims that you encounter on a daily basis, and to start to make your own. We’ve tried to include only the level of detail that is really needed to start to appreciate these concepts. And, as always, we’ve included plenty of examples to help you grasp them.
TO BELIEVE OR NOT BELIEVE
It is human nature to use past experience and observation to guide decision making, and evolutionarily this makes sense. If you watched someone get sick after they ate a certain food or get hurt by behaving a certain way around an animal, it follows that you should not copy that behavior. Unfortunately, this shortcut doesn’t always result in good thinking. For example:
We had a big snowstorm this year; so much for global warming.
My grandfather lived to his eighties and smoked a pack a day for his whole life, so I don’t believe that smoking causes cancer.
I have heard several news reports about children being harmed. It is so much more dangerous to be a child these days.
I got a runny nose and cough after I took the flu vaccine, and I think it was caused by the vaccine.
These are all examples of drawing incorrect conclusions using anecdotal evidence, informally collected evidence from personal anecdotes. You run into trouble when you make generalizations based on anecdotal evidence or weigh it more heavily than scientific evidence. Unfortunately, as Michael Shermer, founder of the Skeptics Society, points out in his 2011 book The Believing Brain, “Anecdotal thinking comes naturally, science requires training.”
One issue with anecdotal evidence is that it is often not representative of a full range of experiences. People are more inclined to share out-of-the-ordinary stories. For instance, people are more likely to write a review when they had a terrible experience or an amazing experience. As a result, the only takeaway from an anecdote is that a single event may have occurred.
If you hear an anecdote about someone who smoked and escaped lung cancer, that only proves you are not guaranteed to get lung cancer if you smoke. However, based solely on this anecdote, you cannot draw a conclusion on the chances that an average smoker will get cancer or the relative likelihood of smokers getting lung cancer compared with nonsmokers. If everyone who ever smoked got lung cancer and everyone who didn’t smoke never got lung cancer, the data would be a lot more convincing. Unfortunately, the real world is rarely that simple.
You may have heard anecdotes about people who happened to get cold and flu symptoms around the time that they got the flu vaccine and blame their illness on the vaccine. Just because two events happened in succession, or are correlated, doesn’t mean that the first actually caused the second. Statisticians use the phrase correlation does not imply causation to describe this fallacy.
What is often overlooked when this fallacy arises is a confounding factor, a third, possibly non-obvious factor that influences both the assumed cause and the observed effect, confounding the ability to draw a correct conclusion. In the case of the flu vaccine, the cold and flu season is that confounding factor. People get the flu vaccine during the time of year when they are more likely to get sick, whether they have received the vaccine or not. Most likely the symptoms people are experiencing are from a common cold, which the flu vaccine does not protect against.
In other instances, a correlation can occur by random chance. It’s easier than ever to test the correlation between all sorts of information, so many spurious correlations are bound to be discovered. In fact, there is a hilarious site (and book) called Spurious Correlations, chock-full of these silly results. The graph below shows one such correlation, between cheese consumption and deaths due to bedsheet tanglings.
Correlation Does Not Imply Causation
One time when Lauren was in high school, she started feeling like a cold was coming on, and her dad told her to drink plenty of fluids to help her get better. She proceeded to drink half a case of raspberry Snapple that day, and, surprisingly, the next day she felt a lot better! Was this clear evidence that raspberry Snapple is a miracle cure for the common cold? No. She probably just experienced a coincidental recovery due to the body’s natural healing ability after also drinking a whole bunch of raspberry Snapple.
Or mayb
e she wasn’t sick at all; maybe she was just randomly having a bad day, followed by a more regular day. Many purveyors of homeopathic “treatments” include similar anecdotal reports of coincidental recoveries in advertisements for their products. What is not mentioned is what would have happened if there were no “treatment.” After all, even when you are sick, your symptoms will vary day by day. You should require more credible data, such as a thorough scientific experiment, before you believe any medical claims on behalf of a product.
If you set out to collect or evaluate scientific evidence based on an experiment, the first step is to define or understand its hypothesis, the proposed explanation for the effect being studied (e.g., drinking Snapple can reduce the length of the common cold). Defining a hypothesis up front helps to avoid the Texas sharpshooter fallacy. This model is named after a joke about a person who comes upon a barn with targets drawn on the side and bullet holes in the middle of each target. He is amazed at the shooter’s accuracy, only to find that the targets were drawn around the bullet holes after the shots were fired. A similar concept is the moving target, where the goal of an experiment is changed to support a desired outcome after seeing the results.
One method to consider, often referred to as the gold standard in experimental design, is the randomized controlled experiment, where participants are randomly assigned to two groups, and then results from the experimental group (who receive a treatment) are compared with the results from the control group (who do not). This setup isn’t limited to medicine; it can be used in fields such as advertising and product development. (We will walk through a detailed example in a later section.)
A popular version of this experimental design is A/B testing, where user behavior is compared between version A (the experimental group) and version B (the control group) of a site or product, which may differ in page flow, wording, imagery, colors, etc. Such experiments must be carefully designed to isolate the one factor you are studying. The simplest way to do this is to change just one thing between the two groups.
Ideally, experiments are also blinded, so that participants don’t know which group they are in, preventing their conscious and unconscious bias from influencing the results. The classic example is a blind taste test, which ensures that people’s brand affinities don’t influence their choice.
To take the idea of blinding one step further, the people administering the experiment or analyzing the experiment can also remain unaware of which group the participants are in. This additional blinding helps reduce the impact of observer-expectancy bias (also called experimenter bias), where the cognitive biases of the researchers, or observers, may cause them to influence the outcome in the direction they expected.
Unfortunately, experimenter blinding doesn’t completely prevent observer-expectancy bias, because researchers can still bias results in the preparation and analysis of a study, such as by engaging in selective background reading, choosing hypotheses based on preconceived notions, and selectively reporting results.
In medicine, researchers go to great lengths to achieve properly blinded trials. In 2014, the British Medical Journal (BMJ) published a review by Karolina Wartolowska et al. of fifty-three studies that compared an actual surgical intervention with a “sham” surgery, “including the scenario when a scope was inserted and nothing was done but patients were sedated or under general anesthesia and could not distinguish whether or not they underwent the actual surgery.”
These fake surgeries are an example of a placebo, something that the control participants receive that looks and feels like what the experimental participants receive, but in reality is supposed to have no effect. Interestingly, just the act of receiving something that you expect to have a positive effect can actually create one, called the placebo effect.
While placebos have little effect on some things, like healing a broken bone, the placebo effect can bring about observed benefits for numerous ailments. The BMJ review reported that in 74 percent of the trials, patients receiving the fake surgeries saw some improvement in their symptoms, and in 51 percent of the trials, they improved about as much as the recipients of actual surgeries.
For some conditions, there is even evidence to suggest that the placebo effect isn’t purely a figment of the imagination. As an example, placebo “pain relievers” can produce brain activity consistent with the activity produced by actual pain-relieving drugs. For all the parents out there, this is why “kissing a boo-boo” actually can help make it better. Similarly, anticipation of side effects can also result in real negative effects, even with fake treatments, a phenomenon known as the nocebo effect.
One of the hardest things about designing a solid experiment is defining its endpoint, the metric that is used to evaluate the hypothesis. Ideally, the endpoint is an objective metric, something that can be easily measured and consistently interpreted. Some examples of objective metrics include whether someone bought a product, is still alive, or clicked a button on a website.
However, when the concept that researchers are interested in studying isn’t clearly observable or measurable, they must use a proxy endpoint (also called a surrogate endpoint or marker), a measure expected to be closely correlated to the endpoint they would measure if they could. A proxy essentially means a stand-in for something else. Other uses of this mental model include the proxy vote (e.g., absentee ballot) and proxy war (e.g., current conflicts in Yemen and Syria are a proxy war between Iran and Saudi Arabia).
While there is no one objective measure of the quality of a university, every year U.S. News and World Report tries to rank schools against one another using a proxy metric that is a composite of objective measures, such as graduation rates and admission data, along with more subjective measures, such as academic reputation. Other examples of common proxy metrics include the body mass index (BMI), used to measure obesity, and IQ, used to measure intelligence. Proxy metrics are more prone to criticism because they are indirect measures, and all three of these examples have been criticized significantly.
As an example of why this criticism can be valid, consider abnormal heart rhythms (ventricular arrhythmias) that can cause sudden death. Anti-arrhythmic drugs have been developed that prevent ventricular arrhythmias, and so it would seem obvious that these drugs would be expected to prevent sudden death in the patients who take them. But use of these drugs actually leads to a significant increase in sudden death in patients with asymptomatic ventricular arrhythmias after a heart attack. For these patients, the reduced post-treatment rate of ventricular arrhythmias is not indicative of improved survival and is therefore not a good proxy metric.
However, despite the complications that arise when conducting well-run experiments, collecting real scientific evidence beats anecdotal evidence hands down because you can draw believable conclusions. Yes, you have to watch out for spurious correlations and subtle biases (more on that in the next section), but in the end you have results that can really advance your thinking.
HIDDEN BIAS
In the last section, we mentioned a few things to watch out for when reviewing or conducting experiments, such as observer-expectancy bias and confounding factors. There are a few more of these subtle concepts to be wary of.
First, sometimes it is not ethical or practical to randomly assign people to different experimental groups. For example, if researchers wanted to study the effect of smoking during pregnancy, it wouldn’t be right to make nonsmoking pregnant women start smoking. The smokers in the study would therefore be those who selected to continue smoking, which can introduce a bias called selection bias.
With selection bias, there is no guarantee that the study has isolated smoking to be the only difference between these groups. So if there is a difference detected at the end of the study, it cannot be easily determined how much smoking contributed to this difference. For instance, women who choose to continue smoking during their pregnancy against the advice of doctors may similarly make other medically questionable choices, which could drive adve
rse outcomes.
Selection bias can also occur when a sample is selected that is not representative of the broader population of interest, as with online reviews. If the group studied isn’t representative, then the results may not be applicable overall.
Essentially, you must be really careful when drawing conclusions based on nonrandom experiments. The Dilbert cartoon above pokes fun at the selection bias inherent in a lot of the studies reported in the news.
A similar selection bias occurs with parents and school choice for their kids. Parents understandably want to give their kids a leg up and will often move or pay to send their kids to “better schools.” However, is the school better because there are better teachers or because the students are better prepared due to their parents’ financial means and interest in education? Selection bias likely explains some significant portion of these schools’ better test scores and college admissions.
Another type of selection bias, common to surveys, is nonresponse bias, which occurs when a subset of people don’t participate in an experiment after they are selected for it, e.g., they fail to respond to the survey. If the reason for not responding is related to the topic of the survey, the results will end up biased.
For instance, let’s suppose your company wants to understand whether it has a problem with employee motivation. Like many companies, you might choose to study this potential problem via an employee engagement survey. Employees missing the survey due to a scheduled vacation would be random and not likely to introduce bias, but employees not filling it out due to apathy would be nonrandom and would likely bias the results. That’s because the latter group is made up of disengaged employees, and by not participating, their disengagement is not being captured.
Surveys like this also do not usually account for the opinions of former employees, which can create another bias in the results called survivorship bias. Unhappy employees may have chosen to leave the company, but you cannot capture their opinions when you survey only current employees. Results are therefore biased based on measuring just the population that survived, in this case the employees remaining at the company.
Super Thinking Page 16