by Ian Ayres
Waldfogel is an auburn-haired, slightly balding imp who has a reputation for being one of the funniest number crunchers around. And he has one of the quirkiest minds. Joel often shines his light on overlooked corners of our society. Waldfogel has looked at how game show contestants learned from one season to the next. And he has estimated “the deadweight loss” of Christmas—that’s when your aunt spends a lot on a sweater that you just can’t stand. He’s the kind of guy who goes out and ranks business schools based on their value added—how much different schools increase the expected salary of their respective students.
To my mind, his most important number crunching has been to look at the sentencing proclivity of judges. Just as we’ve seen time and time before, random assignment means that each judge within a particular district should expect to see the same kind of cases. Judges in Kansas may see different cases than judges in D.C., but the random assignment of cases assures that judges within any particular district will see not only about the same proportion of civil and criminal cases, they will see about the same proportion of criminal cases where the defendant deserves a really long sentence.
Waldfogel’s “a-ha” moment was simply that random judicial assignment would allow him to rank judges on their sentencing proclivity. If judges within a district were seeing the same kinds of cases, then intra-district disparities in criminal sentencing had to be attributable to differences in judicial temperament. Of course, it might be that certain judges by chance just received a bunch of rotten apple defendants who really deserved to go away for a long time. But statistics is really good at distinguishing between noise and an underlying tendency.
Even though federal judges are required to follow sentencing guidelines—grids that foreordain narrow ranges of sentences for defendants who committed certain crimes—Waldfogel found substantial sentencing disparities between judges. There really are the modern-day equivalent of “hanging judges” and “bleeding hearts”—who found ways to manipulate the guidelines to increase or decrease the time served.
These differences in sentencing are troubling if we want our country to provide “equal protection under the law.” But Waldfogel and others saw that these disparities at least have one advantage—they give us a powerful way to measure whether longer sentences increased or decreased recidivism.
The holy grail of criminologists has been to learn whether prison “hardens” or “rehabilitates” criminals. Does putting rapists away for ten instead of five years increase or decrease the chance that they’ll rape again when they’re back out on the street? This is an incredibly hard question to answer because the people we put away for ten years are different from those we put away for five years. Ten-year inmates might have a higher recidivism rate—not because prison hardened them, but because they were worse guys to begin with.
Waldfogel’s randomization insight provided a way around this problem. Why not look at the recidivism rates of criminals sentenced by individual judges? Since the judges see the same types of criminals, differences in the judges’ recidivism rates must be attributable to disparities in the judges’ sentencing. Random assignment to (severe or lenient) judges is equivalent to randomly assigning criminals to longer or shorter sentences. Just as Waldfogel ranked business schools based on how well their students performed in the aftermarket, Waldfogel’s exploitation of randomization allows a ranking of judges based on how well their defendants perform in the post-prison aftermarket.
So what’s the answer? Well, the best evidence is that neither side in the debate is right. Putting people in jail neither increases nor decreases the probability that they’ll commit a crime when they’re released. Brookings Institute economist Jeff Kling found that post-release earnings of people sentenced by the hanging judges were not statistically different from those sentenced by the judicial bleeding hearts. A convict’s earnings after prison are a pretty strong indicator of recidivism because people who are caught and put back in prison have zero taxable earnings. More recently, two political scientists, Danton Berube and Donald Green, have directly looked at the recidivism rates of those sentenced by judges with different sentencing propensities. Not only do they find that longer sentences incapacitate prisoners from committing crimes outside of prison, but also that the longer sentences of the hanging judges were not associated with increased or decreased recidivism rates once the prisoners hit the streets. The “lock ’em up” crowd can take solace in the fact that longer sentences are not hardening prisoners. Then again, the longer sentences don’t specifically deter future bad acts. Because of randomized assignments, we might start changing the debate about sentencing length from questions about specific deterrence and rehabilitation, and instead ask whether longer sentences deter other people from committing crime or whether simply incapacitating bad guys makes longer sentences worthwhile.
But the big takeaway here concerns the possibility of piggybacking. Instead of randomly intervening to create data, it is sometimes possible to piggyback on pre-existing randomization. That’s what criminologists are doing with regard to random judicial assignments. And it’s what I’ve started to do with regard to random assignments at our local school district. About 20 percent of New Haven schoolchildren apply to attend oversubscribed magnet schools. For the schools that are oversubscribed, the kids are chosen by lottery. Can you see what piggybacking will let me do? I can look at all the kids who applied to Amistad Academy and then compare the average test scores of those who got in and those who didn’t. Piggybacking on randomization provides me with the kind of Super Crunching information that will allow me to rank the value added of just about every school in the district.
The World of Chance
The randomized testing of social policy is truly now a global phenomenon. Dozens upon dozens of regulatory tests have been completed in every corner of the globe. Indeed, if anything, developing countries have taken the lead in embracing the randomizing method. A test that would cost millions of dollars in the U.S. can be undertaken for a fraction of the cost in the Third World.
The spread of randomized testing is also due to the hard work of the Poverty Action Lab. Founded at MIT in 2003 by Abhijit Banerjee, Esther Duflo, and Sendhil Mullainathan, the Poverty Action Lab is devoted to using randomized trials to test what development strategies actually work. Their motto is “translating research into action.” By partnering with non-profit organizations around the world, in just a short time the lab has been able to crank out dozens of tests on everything from public health measures and micro-credit lending to AIDS prevention and fertilizer use.
The driving force behind the lab is Esther Duflo. Esther has endless energy. A wiry mountain climber (good enough to summit at Mount Kenya), she also has been rated the best young economist from France, and is the recipient of one of the lucrative MacArthur “Genius” fellowships. Esther has been tireless in convincing NGOs (non-government organizations) to condition their subsidies on randomized testing.
The use of randomized tests to reduce poverty sometimes raises ethical concerns—because some destitute families are capriciously denied the benefits of the treatment. In fact, what could be more capricious than a coin flip? Duflo counters, “In most instances we do not know whether the program will work or whether it’s the best use of the money.” By conducting a small randomized pilot study, the NGO can figure out whether it’s worthwhile to “scale” the project—that is, to apply the intervention on a non-randomized basis to the entire country. Michael Kremer, a lab affiliate, sums it up nicely: “Development goes through a lot of fads. We need to have evidence on what works.”
Other countries can sometimes test policies that U.S. courts would never allow. Since 1998, India has mandated that the chief (the Pradhan) in one-third of village councils has to be a woman. The villages with set aside (or “reserved”) female chiefs were selected at random. Voilà, we have another natural experiment which can be analyzed simply by comparing villages with and without chiefs who were mandated to be female. Turns o
ut the mandated women leaders were more likely to invest in infrastructure that was tied to the daily burdens of women—obtaining water and fuel—while male chiefs were more likely to invest in education.
Esther has also helped tackle the problem of rampant teacher absenteeism in Indian schools. A non-profit organization, Seva Mandir, has been instrumental in establishing single-teacher schoolhouses in remote rural areas where students have no access to government education. Yet massive teacher absenteeism has undermined the effectiveness of these schools. In some states, teachers simply don’t show up for class about half the time.
Esther decided to see if cameras might help. She took 120 of Seva Mandir’s single-teacher schools and in half of them she gave the teacher a camera with a tamper-proof date and time stamp. The teachers at the “camera schools” were instructed to have one of the children photograph the teacher with other students at the beginning and end of each school day. The salary of a teacher at the camera school was a direct function of his or her attendance.
A little monitoring went a long way. Cameras had an immediate positive effect. “As soon as the program started in 2004,” Esther told me, “the absentee rate of teachers fell from 40 percent (a very high number) to 20 percent. And magically, now it has been going on since then and that 20 percent is always there.” Better yet, students at the camera schools learned more. A year after the program, the kids at camera schools scored substantially higher than their counterparts on standardized tests and were 40 percent more likely to matriculate to regular schools.
Randomized trials are being embraced now in country after country to evaluate the impact of all kinds of public policies. Randomized tests in Kenya have shown the powerful impact of de-worming programs. And randomized tests in Indonesia have shown that a threat of ex post auditing can substantially increase the quality of road construction.
But by far the most important recent randomized social experiment of development policy is the Progresa Program for Education Health and Nutrition. Paul Gertler, one of the six researchers enlisted to evaluate the experiment, told me about how Mexican President Ernesto Zedillo created the program. “Zedillo, who was elected in 1995, was the accidental president,” Gertler said. “The original candidate was assassinated and Zedillo, who was the education minister and more of a technocrat, became president. He decided that he wanted to have a major effect on Mexico’s poverty and together with members of his administration, he came up with a very unique poverty alleviation program, which is Progresa.”
Progresa is a conditional transfer of cash to poor people. “To get the cash,” Gertler said, “you had to keep your kids in school. To get the cash you had to go to get prenatal care if you are pregnant. You had to go for nutrition monitoring. The idea was to break the intergenerational transfer of poverty because children who typically grow up in poverty tend to remain poor.”
Conditioning cash on responsible parenting was a radical idea. And the money would only go to mothers, because Zedillo believed studies suggesting that mothers were more likely than fathers to use the money for their children. Zedillo thought the program had to have a sustained implementation if it had any hope of improving the health and education of children when they became adults. It is not something you would accomplish in just a year.
Zedillo’s biggest problem was to try to structure Progresa so that it might outlive his presidency. “Now, the politics in Mexico were such that poverty programs in the past typically changed every presidential campaign,” Gertler said. “As hard as this is to believe, the person who was running said that what the current government was doing wasn’t very good and it needed to be completely changed. And this is true even if it was the same political party. So, typically what happened is there would be a new president who would come in after five or six years and he would immediately scrap the policies of the previous government and start over again.”
Zedillo initially wanted to try Progresa on three to five million families. But he was afraid that he didn’t have time. Gertler continued, “If you have a five-year administration and it takes three years to get a program up and running, then it doesn’t have much time to have an impact before the new government comes in and closes it.” So Zedillo decided instead to conduct a much smaller, but statistically still very large, randomized study of more than 500 villages. On the smaller scale, he could get the program up and running in just a year. He chose to have the program evaluated by independent international academics. It was very much a demonstration project. “Zedillo hoped,” Gertler said, “if the evaluation demonstrated that the program had a high benefit-cost ratio, then it would be very hard for the next government to ignore and to close down this program.”
So starting in 1997, Mexico began a randomized experiment on more than 24,000 households in 506 villages. In villages assigned to the Progresa program, the mothers of poor families were eligible for three years of cash grants and nutritional supplements if the children made regular visits to health clinics and attended school at least 85 percent of the time. The cash payments were set at roughly two-thirds the wages that children could earn on the free market (and thus increased with the children’s age).
The Progresa villages almost immediately showed substantial improvements in education and health. Progresa boys attended school 10 percent more than their non-Progresa counterparts. And school enrollment for Progresa girls was 20 percent higher than for the control group. Overall, teenagers went to school about half a year longer over the initial two-year evaluation period and Progresa students were much less likely to be held back.
The improvements in health were even more dramatic. The program produced a 12 percent lower incidence of serious illness and a12.7 percent reduction in hemoglobin measures of anemia. Children in the treated villages were nearly a centimeter taller than their non-Progresa peers. A centimeter of additional growth in such a short time is a big deal as a measure of increased health. Gertler explained there were three distinct reasons for the dramatic increase in size: “First, there were nutrition supplements, which were directly given to young kids who are stunted. Second, there was good prenatal and postnatal care to reduce the infection rate. And third, there was just more money to buy food in general.”
Sometimes the mechanisms for improvement were more surprising. The evaluators found that birth weights in the Progresa villages increased about 100 grams and that the proportion of low-weight babies dropped by several percentage points. “This is huge in terms of magnitude,” Gertler said. The puzzle was why. Pregnant women in the Progresa villages weren’t eating better or going for more prenatal visits than pregnant women in the non-Progresa villages. The answer seems to be that Progresa women were more demanding. Gertler explains: “Progresa villages had these sessions where women are told that if you are pregnant when you go for prenatal care here is what to expect. They should weigh and measure you. They should check for anemia. They should check for diabetes. And so it started empowering women and gave them the means and the information to start demanding services that they should get. And then when we interviewed physicians—the physicians say, ‘Oh, those Progresa women. They are so much trouble. They come in, they demand more services. They want to talk about everything. We ended up spending so much more time with them. They are really difficult.’”
The Progresa program has proven extremely popular. It has a 97 percent take-up rate. Instead of asking for sacrifice for speculative future benefits, Progresa gives desperately poor mothers money today if they’re willing to invest in their children’s future. And Zedillo’s hope that a demonstration project would tie the hands of his successor worked like a charm. After the 2000 election, when Vicente Fox became president (and for the first time in Mexican history an incumbent president of an opposition party peacefully surrendered power), he was hard put to ignore the Progresa successes in health and education.
“And after we presented our evaluation to them,” Gertler said, “the Fox administration came back and said, ‘You
know, I’m sorry, we need to close down the program. But we are going to open this new program to replace Progresa and this new program is called Oportunidades. It is going to have the same beneficiaries, the same benefits, same management structure, but we have a better name for the program.’” The transparency and third-party verification of the randomized evidence was critical in convincing the government to continue the program. Zedillo’s ploy really solved the political economy problem in a big way.
In 2001, Mexico expanded the Progresa (Oportunidades) program to urban areas, adding two million families. Its budget in 2002 was $2.6 billion or about 0.5 percent of Mexican GDP. Progresa is a prime example of how Super Crunching is having real-world impacts. The Super Crunching phenomena often involves large datasets, quick analysis, and the possibility of scaling the results to a much larger population. The Progresa program illustrates the possibility of all three aspects of this new phenomenon. Information on more than 24,000 families was collected, analyzed, and in the space of just five years scaled to cover about 100-fold more people. There has never been a randomized experiment that has had so large a macroeconomic impact.
And the randomization method has revolutionized the way Mexican policy gets made. “The impact of Progresa evaluation has just been huge,” Gertler said. “It caused Congress to pass a law which says now all social programs have to be evaluated and it becomes part of the budgetary process. And so now they are evaluating nutrition programs and job programs and microfinance programs and education programs. And it has become the lexicon of the debate over good public policy. The debate is now filled with facts about what works.”
What’s more, the Progresa idea of conditional cash transfers is spreading like wildfire around the globe. Gertler grows animated as he talks about the impact of the study: “With Progresa we proved that randomized evaluation is possible on a large scale and that the information was very useful for policy and decision making. Progresa is why now thirty countries worldwide have conditional cash transfer programs.” Even New York City is now actively thinking about whether it should adopt conditional cash transfers. The Progresa experiment has shown that pragmatic programs can help desperately poor children literally grow.