by Ian Ayres
More insidiously, sellers might run randomized tests on what kinds of terms to include in their consumer contracts. Some businesses are really pushing the ethical and legal envelope by testing whether to conceal warranty waivers on their website. If a company runs a randomized test and finds that disclosing the waiver on a top-level page substantially reduces their sales, they have an incentive to bury the offensive information. Matt Roche emphasizes, though, that even this kind of warranty testing is a two-edged sword that sometimes helps consumers. Randomized tests show that displaying certain types of express warranties increases sales. “Including the Verisign security certificate almost universally gives you a lift,” he said. “Tests show that consumers demand trust. Without this hard evidence, a lot of companies wouldn’t have been willing to give it.”
Getting Into the Game
Randomized trials are not some ivory tower daydream. Lots of firms are already using them. The puzzle is why more firms aren’t following the lead of CapOne and Jo-Ann Fabrics. Why isn’t Wal-Mart running randomized experiments? Wal-Mart is great at using what their consumers have done in the past to predict the future. Yet at least publicly they won’t admit to randomized testing. All too often information management is limited to historical data, to recent and not-so-recent information about past transactions. Business is now very good at tracking these kinds of data, but businesses as a group still haven’t gone far enough in proactively creating useful new data.
The examples in this chapter show that it’s really not that hard. The Excel “=rand()” function on just about any computer will flip the coin for you. Any bright high schooler could run a randomized test and analyze it. The setup isn’t hard and the analysis is just a comparison of two averages, the average results for the “treated” and “untreated” groups. That’s really all Offermatica is doing when they tell you the average click-through rate for one web page versus another. (Okay, it’s more complicated when they use the Taguchi Method to analyze the results of multiple tests.)
The simplicity of these studies in both collecting the data and analyzing the results is also why it’s so much easier to explain the results to people who don’t like thinking about things like heteroskedasticity and BLUE estimators. The fancier statistical regressions are much harder for non-statisticians to understand and trust. In effect, the statistician at some point has to say, “Trust me. I did the multivariate regression correctly.” It’s easier to trust a randomized experiment. The audience still has to trust that the researcher flipped the coin correctly, but that’s basically it. If the only difference between the two groups is how they’re treated, then it’s pretty clear that the treatment is the only thing that can be causing a difference in outcome.
Randomization also frees the researcher to take control of the questions being asked and to create the information that she wants. Data mining on the historic record is limited by what people have actually done. Historic data can’t tell you whether teaching statistics in junior high school increases math scores if no junior high has ever offered this subject. Super Crunchers who run randomized experiments, however, can create information to answer this question by randomly assigning some students to take the class (and seeing if they do better than those who were not chosen).
Firms historically have been willing to create more qualitative data. Focus groups are one way to explore what the average “man on the street” thinks about a new or old product. But the marketer of the future will adopt not just social science methods of multivariate regressions and the mining of historical databases, she will start exploiting the randomized trials of science.
Businesses realize that information has value. Your databases not only help you make better decisions, database information is a commodity that can be sold to others. So it’s natural that firms are keeping better track of what they and their customers are doing. But firms should more proactively figure out what pieces of information are missing and take action to fill the gaps in their data. And there’s no better way to create information about what causes what than via randomized testing.
Why aren’t more firms doing this already? Of course, it may be because traditional experts are just defending their turf. They don’t want to have to put their pet policies to a definitive test, because they don’t want to take the chance that they might fail. But in part, the relative foot-dragging may have to do with timing. Randomized trials require firms to hypothesize in advance before the test starts. My editor and I had a devil of a time deciding what book titles we really wanted to test. Running regressions, in contrast, lets the researcher sit back and decide what to test after the fact. Randomizers need to take more initiative than people who run after-the-fact regressions, and this difference might explain the slower diffusion of randomized trials in corporate America.
As we move to a world in which quantitative information is increasingly a commodity to be hoarded, or bought, or sold, we will increasingly find firms that start to run randomized trials on advertisements, on price, on product attributes, on employment policies. Of course not all decisions can be pre-tested. Some decisions are all-or-nothing, like the first moon launch, or whether to invest $100 million in a new technology. Still, for many, many different decisions, powerful new information on the wellsprings of human action is just waiting to be created.
This book is about the leakage of social science methods from the academy to the world of on-the-ground decision making. Usually and unsurprisingly, business has been way ahead of government in picking up useful technologies. And the same is true about the technology of Super Crunching. When there’s a buck to be made, businesses more than bureaucrats scoop it up. But randomization is one place where government has taken the lead. Somewhat perversely, the checks and balances of a two-party system might have given government a leg up on the more unified control of corporations in the move to embrace randomization. Political adversaries who can’t agree on substance can at least reach bipartisan agreement on a procedure for randomization. They’ll let some states randomly test their opponents’ preferred policy if the opponents will let other states randomly test their preferred policy. Bureaucrats who lack the votes to get their favored policy approved sometimes have sufficient support to fund a randomized demonstration project. These demonstration projects usually start small, but the results of randomized policy trials can have supersized impacts on subsequent policy.
CHAPTER 3
Government by Chance
Way back in 1966, Heather Ross, an economics graduate student at MIT, had an audacious idea. She applied for a huge government grant to run a randomized test of the Negative Income Tax (NIT). The NIT pays you money if your income falls below some minimum level and effectively guarantees people a minimum income regardless of how much they earn working. Heather wanted to see whether the NIT reduced people’s incentives to work. When the Office of Economic Opportunity approved her grant, Heather ended up with a $5 million thesis. She found that the NIT didn’t reduce employment nearly as much as people feared, but there was a very unexpected spike in divorce. Poor families that were randomly selected to receive the NIT were more likely to split up.
The biggest impact of Ross’s test was on the process of how to evaluate government programs themselves. Heather’s simple application of the medical randomization method to a policy issue unleashed what has now grown into a torrent of hundreds of randomized public policy experiments at home and abroad. U.S. lawmakers increasingly have come to embrace randomization as the best way to test what works. Acceptance of the randomized methodology was not a partisan issue but a neutral principle to separate the good from the bad and the ugly. Government isn’t just paying for randomized trials; for the first time the results are starting to drive public policy.
Spending Money to Save Money
In 1993, a young whiz-kid economist named Larry Katz had a problem. As chief economist for the Department of Labor, he was trying to convince Congress that it could save $2 billion a year by making a simple
change to unemployment insurance (UI). By spending some additional money to give the unemployed job-search assistance, Larry thought we could reduce the length of time that workers made unemployment claims. The idea that spending money on a new training program would save $2 billion was not an easy sale to skeptical politicians.
Larry, however, is no pushover. He’s not physically intimidating. Even now, in his forties, Larry still looks more like a wiry teenager than a chaired Harvard professor (which he actually is). But he is scary smart. Long ago, Larry and I were roommates at MIT. I still remember how, in our first week of grad school, the TA asked an impossible question that required us to use something called the hypergeometric distribution—something we’d never studied. To most of us, the problem was impenetrable. Katz, however, somehow derived the distribution on his own and solved the problem.
While Larry is soft-spoken and calm, he is absolutely tenacious when he is defending an idea that he knows is right. And Larry knew he was right about job assistance because of randomized testing. His secret weapon was a series of welfare-to-work tests that states conducted after Senator Patrick Moynihan in 1989 inserted an evidence-based provision into our federal code. The provision said that a state could experiment with new ideas on how to reduce unemployment insurance if and only if the ideas were supported by an evaluation plan that must “include groups of project participants and control groups assigned at random in the field trial.”
Moynihan’s mischief led to more than a dozen randomized demonstration projects. Many of the states ran tests looking to see whether providing job-search assistance could reduce a state’s unemployment insurance payments. Instead of providing job training for new on-the-job skills, the job-search assistance was geared to provide advice on how to go about applying and interviewing for a new job. These “search-assistance” tests (which took place in Minnesota, Nevada, New Jersey, South Carolina, and Washington) were also novel because they combined the two central types of database–decision making, regressions and randomization.
The search-assistance programs used regression equations to predict which workers were most likely to have trouble finding a job on their own. The regressions represent a kind of statistical profiling which allowed programs to concentrate their efforts to help those workers who were likely to need it and respond to it most. After the profiling stage, randomization kicked in. The tests randomly assigned qualifying unemployed workers to treatment and control groups to allow a direct test of the specific intervention’s impact. So the studies used both regressions and randomization to promote better public policy.
These UI tests taught us a lot about what works and what doesn’t. Unemployed workers who received the assistance found a new job about a week earlier than similar individuals who did not receive the assistance. In Minnesota, which provided the most intensive search assistance, the program reduced unemployment by a whopping four weeks. Finding jobs faster didn’t mean finding poorer-paying jobs either. The jobs found by the program participants paid just as well as the ones found later by non-participants.
Most importantly from the government’s perspective, the programs more than paid for themselves. The reduction in UI benefits paid plus the increase in tax receipts from faster reemployment were more than enough to pay for the cost of providing the search assistance. For every dollar invested in job assistance, the government saved about two dollars.
Larry used the results of this testing to convince Congress that the savings from mandated job assistance would be more than enough to fund the projected $2 billion needed to extend unemployment benefits during the recession of 1993. Larry calmly deflated any objection Congressional leadership raised about the program’s effectiveness. The transparency of randomized trials made Larry’s job a lot easier. Unemployed workers who were similar in every respect found jobs faster when they received search assistance. It’s pretty manifest that search assistance was the reason why. Up against the brutal power of a randomized trial and the intellect of Katz, the opponents never really had a chance.
The statistical targeting of these search-assistance programs was crucial to their cost savings. The costs of government support programs can balloon if they are opened to too large a class of individuals. This is the same kind of problem Head Start has to grapple with. Lots of people will tell you that prison costs three times as much as early childhood education. The problem with this comparison is that there are a lot more kids than there are prisoners. It’s not necessarily cheaper to pay for every kid’s pre-K program even if pre-K programs reduce the chance of later criminality—because most kids aren’t going to commit crime anyway. Racial profiling by police has gotten a deservedly bad name, but non-racial profiling may be the key to concentrating resources on the kids who are high risk. You might think that you can’t say which kids at four or five are at higher risk of committing crime when they’re sixteen or seventeen, but isn’t that just what Orley Ashenfelter did with regard to immature grapes? The search-assistance tests show that statistical profiling may be used for smarter targeting of government support.
A True State Laboratory of Ideas
The growth of randomized experiments like the search-assistance trials for the first time pays off on the idea that our federalist system of independent states could create a rigorous “laboratory of democracy.” The laboratory metaphor is that each state could experiment with what it thought were the best laws and collectively we could sit back and learn from each other. Great idea. The problem is the experimental design. A good experiment requires a good control group. The problem with many state experiments is that we don’t know how exactly to compare the results. Alaska’s not really like Arizona. The states offer a great opportunity for experimentation. But in real labs, you don’t let the rats design the experiment. The move to randomization lets states experiment on themselves, but with a procedure that provides the necessary control group. Now states are creating quality data that can be used to implement data-driven policy making. Unlike earlier case studies that are quietly shelved and forgotten, the randomized policy experiments are more likely to have Super Crunching impacts on real-world decisions.
And the pace of randomized testing has accelerated. Hundreds of policy experiments are now under way. Larry is one of the leaders of a new HUD-funded effort to find out what happens if poor families are given housing vouchers that can only be used in low-poverty (middle-class) neighborhoods. This “Move to Opportunity” (MTO) test randomly gave housing vouchers to very low-income families in five cities (Baltimore, Boston, Chicago, Los Angeles, and New York City) and is collecting information for ten years on how vouchers impact everything from employment and school success to health and crime. The results aren’t all in yet, but the first returns suggest that there is no huge educational or crime-reduction benefit from moving poor kids to more affluent neighborhoods (with more affluent schools). Girls who moved were a little more successful in school and healthier, but boys who moved have done worse in school and are more likely to commit crime. Regardless of where the cards ultimately fall, the MTO data are going to provide policy makers for the first time with very basic information about whether changing your neighborhood can change your life.
Randomized trials are instructing politicians not just on what kinds of policies to enact, but on how to get themselves elected in the first place. Political scientists Donald Green and Alan Gerber have been bringing randomized testing to bear on the science of politics. Want to know how best to get out the vote? Run a randomized field experiment. Want to know whether direct mail or telephone solicitations work best? Run a test. Want to know how negative radio ads will influence turnout of both your and your opponent’s voters? Run the ads at random in some cities and not in others.
Keeping an Eye Out for Chance
The sky is really the limit. Any policy that can be applied at random to some people and not others is susceptible to randomized tests. Randomized testing doesn’t work well for the Federal Reserve’s interest rate set
ting—because it’s hard to subject some of us to high and others to low interest rates. And it won’t help us design the space shuttle. We’re not going to send some shuttles up with plastic O-rings and others with metal O-rings. But there are tons of business and government policies that are readily susceptible to random assignment.
Up to now, I’ve been describing how business and government analysts have intentionally used randomized assignments to test for impacts. However, it’s also possible for Super Crunchers to piggyback on randomized processes that were instituted for other purposes. There are in fact over 3,000 state statutes that already explicitly mandate random procedures. Instead of flipping coins to create data, we can sometimes look at the effects of randomized processes that were independently created. Because some colleges randomly assigned roommates, it became possible to test the impact roommates have on one another’s drinking. Because California randomizes the order that candidates appear on the ballot, it became possible to test the impact of having your name appear at the top (it turns out that appearing first helps a lot in primaries, not so much in general elections where people are more apt to vote the party line).
But by far the most powerful use of pre-existing randomization concerns the random assignment of judges to criminal trials. For years it has been standard procedure in federal courts to randomly assign cases to the sitting trial judges in that jurisdiction. As with the alphabet lottery, random case assignment was instituted as a way of assuring fairness (and deterring corruption).
In the hands of Joel Waldfogel, randomization in criminal trials has become a tool for answering one of the most central questions of criminal law—do longer sentences increase or decrease the chance that a prisoner will commit another crime?