Randomistas
Page 6
Peirce’s experiment was designed as follows. The test subject put one finger on a balance scale. From behind a screen, an experimenter added or removed small weights, based on random draws from a special pack of cards. The subject then said whether the weight had increased or decreased. At each sitting, the experiment was repeated fifty times. Peirce and his co-author, graduate student Joseph Jastrow, took turns to play the role of subject and experimenter. When the weight difference was 10 per cent, they found that the person being tested could correctly pick it nine times out of ten. But with a weight difference of just 1 per cent, the odds of success fell to two in three. Given that dumb luck would yield a success rate of one in two, this suggests that we don’t have much ability to discern tiny differences.
Writing more than a century later, statistician Stephen Stigler argues that Peirce’s study remains one of the best conducted experiments in psychology.3 The researchers took care to ensure that the subject could not see how the weights were being varied, and used randomisation to prevent any subconscious biases from affecting the pattern of adding and subtracting weights. Future psychologists would learn from the weight study, and apply the same methods in their randomised experiments.
Peirce was a genius and a polymath. He would write out questions with his left hand and answer them with his right. His collected papers total over 100,000 pages, most of them unpublished. Peirce also made contributions in mathematics, astronomy, chemistry and meteorology. He is best known for his work as a philosopher, founding the philosophic tradition known as pragmatism.
But while he was working on his most famous paper, Peirce was embroiled in a scandal that led to him being fired by his university, Johns Hopkins. His wife having left him several years earlier, Peirce had begun living with another woman, Juliette. When the university’s leadership found out about this ‘immorality’, they dismissed him. Peirce hoped he might find a job at Harvard, where his father had enjoyed a prestigious academic career. But he had a powerful foe. While he was an undergraduate student, Peirce had vandalised a bench in one of his chemistry classes. He was fined a dollar at the time, but the episode would prove far costlier in the long term.4 The chemistry instructor, Charles Eliot, became president of Harvard in 1869 and held the position for the next forty years. Eliot had formed a lifelong dislike of Peirce and vetoed him being employed at Harvard in any form. After his firing from Johns Hopkins, Peirce never again worked in academia. He spent the rest of his life trying to subsist on occasional jobs and the generosity of friends.
Peirce’s randomised trial foreshadowed an entire discipline of experimental psychology. His protégé, Joseph Jastrow, became a professor of experimental and comparative psychology at the University of Wisconsin–Madison and published widely, using empirical techniques to help expose psychics as clever tricksters. Unlike Peirce, Jastrow was held in high regard by the profession; he served as president of the American Psychological Association and enjoyed fame through his regular psychology columns in popular magazines. Today, experimental psychology continues to blossom, its findings published in dozens of academic journals and eagerly reported in the media.5 But Charles Peirce, the brilliant randomista who helped kickstart the field, would spend his final two decades unable to afford to heat his house, subsisting on bread donated by the local baker, and writing on the back of old manuscripts because he was too poor to buy paper.
Peirce is one of four pioneers of randomised trials whose lives I explore in this chapter. Working across psychology, agriculture, medicine and social policy, these men and women succeeded in persuading many sceptics of the value of controlled experiments. Their stories are a window into the unusual mix of talents required to be a successful randomista.
*
At an afternoon tea party in England in the 1920s, Ronald Fisher drew a cup of tea from the urn. Seeing Muriel Bristol standing next to him, he politely offered it to her. She declined, telling him that she preferred to put the milk in first. ‘Nonsense,’ smiled Fisher, ‘it makes no difference.’6
At this point, many women might have just taken the cup. Fisher – a thin, short man with a beard and glasses – was hardly the most handsome man in England, but his reputation as a mathematician was growing rapidly. However, Bristol was not just any woman – she was an algae specialist at the research centre where they both worked. She held her ground. At this point, chemist William Roach, standing nearby, chimed in: ‘Let’s test her.’ Could Bristol really tell how a cup of tea had been made?
Fisher quickly set out the experiment. With Roach as his assistant, they began pouring cups of tea, randomly varying whether the milk went into the cup first or last. Each time, Bristol took a sip, and confidently announced how the beverage had been prepared.
Eight cups of tea later, the tea party experiment had proven something new about each of the participants. In every instance, Muriel Bristol had correctly identified whether they were prepared by adding tea to milk, or milk to tea – showing that, for a connoisseur, it really does make a difference. Ronald Fisher, whom many now regard as the father of modern statistics, used the example to think about how many cups were necessary to distinguish luck from skill. William Roach might merely have been the assistant in the experiment, but evidently he performed with panache, for Bristol shortly afterwards accepted his marriage proposal.
Ronald Fisher was the youngest of five children. His mother, Kate, died when he was fourteen. For an academically inclined young man, Fisher’s nearsightedness might have been thought a serious impediment, but he turned it to his advantage – becoming adept at visualising problems in geometrical terms, rather than by working through pages of numerical proofs.
At twenty-two, Fisher graduated from Cambridge with first-class honours. He worked for a time as a high school maths teacher, then as a statistician in the City of London. When he was twenty-nine, he turned down a job at University College London for a much less certain proposition – casual employment at the Rothamsted Experimental Station in Hertfordshire. For Fisher, Rothamsted had one advantage: data. The centre had been running agricultural randomised trials for decades, and so provided the raw material for him to develop his statistical methods.
By the time of his fortieth birthday, Fisher had developed the statistical tests that are used today in almost every empirical paper published in the social sciences. Over his career, his research would also revolutionise biology – creating the ‘modern synthesis’ that used mathematics to integrate Mendelian genetics with Darwinian selection. And in farming, Fisher’s experiments helped boost crop yields, saving millions from hunger and starvation.7
Yet Fisher had flaws aplenty. His early work on maximum likelihood techniques helped popularise the method, but his proofs were wrong. He was one of the most prominent eugenicists of his age – an advocate of the view that society should encourage the upper classes to have more children. In the aftermath of World War II, as a group of scientists worldwide sought to promote racial equality, Fisher disagreed, arguing that people of different races differed profoundly in their intellectual capacities. When early studies showed a link between smoking and lung cancer, Fisher questioned the credibility of their statistics.
Fisher’s first published paper had been titled ‘The evolution of Sexual Preference’. So it was either ironic or fitting that his own marriage eventually broke down. Perhaps exploiting his newfound freedom, he set up a mouse breeding program in his home – using the results for his own research papers. At the end of his life he took up a role as a senior research fellow at the Commonwealth Scientific and Industrial Research Organisation in Adelaide. He died there in 1962.
*
The first sign of tuberculosis is the cough. At first, most patients think they just have a cold. But as the weeks go on and the bacteria grows in the lungs, it becomes more painful. Many begin to cough up phlegm or blood. Tubercular patients lose their appetite, start to sweat at night and develop a fever. Sometimes the nails on their fingers and toes become enlarged. Unles
s a proper course of antimicrobial drugs is administered, active-stage tuberculosis will claim the lives of half of those infected.8
When World War I broke out, Austin Bradford Hill was finishing high school. Coming from a prominent British family, he was expected to follow his father into medicine. He was, in his own words, ‘head of the school, captain of football, in the cricket XI, champion cross country runner, and a prig’.9 Signing up as a naval pilot, Hill was sent to the Dardanelles, but contracted tuberculosis along the way, and was ‘sent home to die’.
Hill was one of the lucky ones. Doctors administered the prevailing treatment: collapsing the infected lung. After a lung abscess and nine months in bed, he began a slow recovery. A career in medicine was now out of the question, but a family friend suggested that he could study economics by correspondence, so he signed up and completed his degree at the University of London in 1922. By now, Hill was well enough to travel, and the same family friend arranged for him to get a research grant to study the poor health of young people in rural Essex. Studies of occupational illness followed, documenting the medical condition of London bus drivers, cotton spinners and printers.
Hill loved medicine so much he read textbooks in his spare time. But his economics training gave him an entree into the burgeoning field of medical statistics. Hearing that a famous statistician was lecturing at University College London, Hill went along to listen. ‘It was mathematical and entirely over my head,’ he recalled, ‘but I learned in the practical side of the course.’ Within a few years, he would be combining research with teaching – but with a much less technical style. Hill’s lectures were regarded as so crisp and clear that they were published in 1937 in The Lancet.
When it comes to medical statistics, ‘common sense is not enough’, Hill argued in his first published lecture. ‘Mistakes which when pointed out look extremely foolish are quite frequently made by intelligent persons, and the same mistakes, or types of mistakes, crop up again and again.’10 At the risk of seeming ‘too simple’, he proposed to set out the most frequent ‘fallacies and misunderstandings’, and show how to avoid them.
The lectures would ultimately become Principles of Medical Statistics, the most famous textbook in the field. But Hill was careful not to push his audience too far. ‘I deliberately left out the words “randomisation” and “random sampling numbers” at that time, because I was trying to persuade the doctors to come into controlled trials in the very simplest form and I might have scared them off . . . I thought it would be better to get the doctors to walk first, before I tried to get them to run.’
In 1946 he saw his chance. Researchers at Rutgers University had been studying organisms that live in soil, and one of them, streptomycin, seemed to be effective against tuberculosis.11 The US army tested the antibiotic on three patients. The first died, the second went blind, and the third recovered rapidly. This third patient’s name was Bob Dole – he would eventually become majority leader of the US Senate and the Republican presidential candidate in the 1996 election.
A success rate of one out of three was hardly conclusive, and Hill saw an opening to conduct a British trial of the new tuberculosis treatment. It had been three decades since he himself was ‘left to die’ from the disease, and tuberculosis still claimed the lives of 180,000 Britons annually.12 Having spent the past ten years teaching doctors about the problems inherent in studies that just compared the treated with the untreated, Hill pushed hard for the streptomycin trial to be randomised. In the end, scarcity clinched it: ‘We had no dollars and the amount we were allowed by the Treasury was enough only for, so to speak, a handful of patients. In that situation I said it would be unethical not to make a randomised controlled trial – the first of its kind.’
That bold claim – that it might be unethical not to conduct a randomised trial – was characteristic of Hill’s self-confidence. Medical randomised trials were essentially unknown, and yet here he was telling the profession that a failure to conduct one would be immoral. Hill had no formal training in medicine, or in statistics, but he had spent years thinking through the issues as he taught his students and engaged with his research colleagues. The trial was a success, and streptomycin today remains one of the drugs that is used to treat tuberculosis.
In the past two centuries alone, tuberculosis has killed more than 1 billion people – more than the combined toll from all the wars and famines in that time.13 Among the victims of ‘the white plague’ were Frédéric Chopin, Anton Chekhov, Franz Kafka, Emily Brontë, George Orwell and Eleanor Roosevelt. Today, tuberculosis still accounts for more than a million deaths worldwide each year. Strains of the disease that are resistant to streptomycin and other antibiotics are becoming increasingly prevalent. Austin Bradford Hill didn’t eliminate the disease that nearly killed him. But he did help to transform medicine. As one colleague wrote of his contribution, Hill brought ‘a quantitative approach to the prevention of diseases’.
*
After four decades of conducting randomised trials in social policy, Judith Gueron has dozens of maxims for researchers: ‘Never say that something about the research is too complex to get into.’ ‘If pressed on an awkward issue about random assignment, do not give an evasive answer . . . if site people [those implementing the experiment] forcefully ask if you really mean they will have to deny services to those in the control group, say “yes”.’ ‘If someone is unreservedly enthusiastic about the study, he or she doesn’t understand it.’14 It’s hard-won wisdom, emerging from more than thirty large-scale social policy trials, involving a total of 300,000 participants.15
In 1974 the Ford Foundation and six federal government agencies created the Manpower Demonstration Research Corporation, now known simply as MDRC. Its mission was to improve understanding about what worked in social policy by conducting random assignment studies. Judith Gueron, then aged thirty-three and having received her Harvard economics PhD just a few years earlier, became MDRC’s first research director.
Raised in Manhattan, Gueron attributes her ambition and confidence to having a father who told her ‘for as long as I can remember, that girls, and I in particular, could do anything’.16 In her work at MDRC, she would need this self-belief. Not only were the fields of economics and policy evaluation heavily male-dominated, but experiments were a radical idea. At that time, academics didn’t get tenure by doing randomised trials, but with complex mathematical models. MDRC was ‘a lonely band of zealots’.17
Gueron’s first major experiment tested whether long-term welfare recipients and people considered ‘unemployable’ could be supported into jobs. Random assignment had never before been attempted on a large multisite employment program of this scale. Gueron’s team were warned it would be impossible, that ‘asking program operators to turn people away would be like asking doctors to deny a patient a known cure’.18
To address the criticism that they were being cold-hearted towards a deserving group of people, MDRC came up with a clever solution: they would expand the size of the treatment group so that it used every last dollar of available funding. That meant their detractors could not credibly claim that having a control group denied worthy recipients from getting a supported job. Even if you scrapped the control group, the number of people who received the treatment would be unchanged.
Gueron recalls how the staff felt as the program was rolled out. They all hoped it would succeed, but had to keep reminding themselves that it probably would not. ‘Fortunately for MDRC in these formative years, we had random assignment – with its inexorable comparison of experimental versus control outcomes – to keep us honest and help us avoid the pitfalls of advocacy research.’19 To Gueron, MDRC was not just ‘another do-gooder organisation’. Staff morale had to hinge on rigorous evaluation, not on whether the programs it scrutinised turned out to be effective.
When the results from the first evaluation rolled in, they showed that the supported work program helped women, but not men. And even for the women the impacts were small. When p
articipants found employment, the government reduced their welfare benefits. Since the jobs didn’t pay much, the net effect was only a slight fall in poverty rates. The program was effective, but no panacea. Yet to Gueron, what mattered most wasn’t the results, but how they had judged the program. A naive evaluator might simply have looked at the raw outcomes, which showed that more men found jobs than women. Yet a randomised evaluation showed that this had nothing to do with the program: men in the control group were just as likely to get jobs as men in the treatment group. It took randomisation to reveal the truth. Gueron was ‘hooked . . . on the beauty and power of an experiment’.20
Throughout the 1980s and ’90s, Gueron worked with state and local agencies across the United States. These were controversial times for welfare policy. Ronald Reagan had told his campaign rallies the story of a ‘Cadillac-driving welfare queen’: an African-American woman who defrauded the welfare system.21 Critics claimed that forcing welfare recipients into low-paid jobs was ‘modern-day slavery’. Bill Clinton ran for president on a pledge to ‘end welfare as we know it’. The debate over income support was fuelled by ideology on both sides.
And then there was MDRC. Read the newspapers of the era, and you come across prudent comments from Gueron: ‘We should be cautious about what we’ve got here’, ‘We haven’t found the pink pill’, it’s not a ‘quick fix for poverty’.22 Even when social programs paid for themselves, her praise was measured.
Getting started with a new randomised evaluation could be tough. In San Jose, Gueron wanted to evaluate a job-training program for young migrants from Mexico. The managers of the program told her that turning away people at random was inconsistent with their mission – the staff would never agree to it. So she met with the staff and explained why random assignment was uniquely reliable and believed. A positive finding, Gueron told the team, might convince the federal government to fund programs like theirs. ‘They agonized about the pain of turning away needy young people, and they talked about whether this would be justified if, as a result, other youth gained new opportunities. Then they asked us to leave the room, talked more, and voted. Shortly thereafter, we were ushered back in and told that random assignment had won.’23 The evaluation showed positive results, and prompted the federal government to fund an expansion across fifteen more sites.