Super Crunchers
Page 13
Consider the worrisome case of Paul Herman Clouston. For over fifty years, Clouston has been in and out of prison in several states for everything from auto theft and burglary to escape. In 1972, he was convicted of murdering a police officer in California. In 1994, he was convicted in Virginia of aggravated sexual battery, abduction, and sodomy, and of assaulting juveniles in James City County, Virginia. He had been serving time in a Virginia penitentiary until April 15, 2005, when he was released on mandatory parole six months before the end of his nominal sentence.
As soon as Clouston hit the streets, he fled. He failed to report for parole and failed to register as a violent sex offender. He is now one of the most wanted men in Virginia. He is on the U.S. Marshals’ Most Wanted list and was recently featured on America’s Most Wanted. But why did this seventy-one-year-old, who had served his time, flee and why did he make all of these most wanted lists?
The answer to both questions is the SVPA. In April of 2003, Virginia became the sixteenth state in our nation to enact a “Sexually Violent Predator Act” (SVPA). Under this extraordinary statute, an offender, after serving his full sentence, can be found to be a “sexually violent predator” and subject to civil commitment in a state mental hospital until a judge is satisfied he no longer presents an undue risk to public safety.
Clouston probably fled because he was worried that he would be adjudged to be a sexual predator (defined in the statute as someone “who suffers from a mental abnormality or personality disorder which makes the person likely to engage in the predatory acts of sexual violence”). And the state made Clouston “most wanted” for the very same reason.
The state was also embarrassed that Clouston had ever been released in the first place. You see, Virginia’s version of the SVPA contained a Super Crunching innovation. The statute itself included a “tripwire” that automatically sets the commitment process in motion if a Super Crunching algorithm predicted that the inmate had a high risk of sexual offense recidivism. Under the statute, commissioners of the Virginia Department of Corrections were directed to review for possible commitment all prisoners about to be released who, and I’m quoting the statute here, “receive a score of four or more on the Rapid Risk Assessment for Sexual Offender Recidivism.” The Rapid Risk Assessment for Sexual Offender Recidivism (RRASOR) is a point system based on a regression analysis of male offenders in Canada. A score of four or more on the RRASOR translates into a prediction that the inmate, if released, would in the next ten years have a 55 percent chance of committing another sex offense.
The Supreme Court in a 5–4 decision has upheld the constitutionality of prior SVPAs—finding that indefinite civil commitment of former inmates does not violate the Constitution. What’s amazing about the Virginia statute is that it uses Super Crunching to trigger the commitment process. John Monahan, a leading expert in the use of risk-assessment instruments, notes, “Virginia’s sexually violent predator statute is the first law ever to specify, in black letter, the use of a named actuarial prediction instrument and an exact cut-off score on that instrument.”
Clouston probably never should have been released because he had a RRASOR score of four. The state has refused to comment on whether they failed to assess Clouston’s RRASOR score as directed by the statute or whether the committee reviewing his case chose to release him notwithstanding the statistical prediction of recidivism. Either way, the Clouston story seems to be one where human discretion led to the error of his release.
It was a mistake, that is, if we trust the RRASOR prediction. Before rushing to this conclusion, however, it’s worthwhile to look at what exactly qualified Clouston as a four on the RRASOR scale. The RRASOR system—pronounced “razor,” as in Occam’s razor—is based on just the four factors listed below:
1. Prior sexual offenses
None
0
1 conviction or 1–2 charges
1
2–3 convictions or 3–5 charges
2
4+ convictions or 6+ charges
3
2. Age of release (current age)
More than 25
0
Less than 25
1
3. Victim gender
Only females
0
Any males
1
4. Relationship to victim
Only related
0
Any nonrelated
1
SOURCE: John Monahan and Laurens Walker, Social Science in Law: Cases and Materials (2006).
Clouston would receive one point for victimizing a male, one for victimizing a nonrelative, and two more because he had three previous sex-offense charges. It’s hard to feel any pity for Clouston, but this seventy-one-year-old could be funneled toward lifetime commitment based in part upon crimes for which he’d never been convicted. What’s more, this statutory trigger expressly discriminates based on the sex of his victims. These factors are not chosen to assess the relative blameworthiness of different inmates. They are solely about predicting the likelihood of recidivism. If it turned out that wholly innocent conduct (putting barbecue sauce on ice cream) had a statistically valid, positive correlation with recidivism, the RRASOR system at least in theory would condition points on such behavior.
This Super Crunching cutoff of course doesn’t mandate civil commitment; it just mandates that humans consider whether he should be committed as a “sexually violent predator.” State officials in exercising this decision not infrequently wave off the Super Crunching prediction. Since the statute was passed, the attorney general’s office has sought commitments against only about 70 percent of the inmates who scored a four or more on the risk assessment, and only about 70 percent of the time have courts granted the state’s petition to commit these inmates.
The Virginia statute thus channels discretion, but it does not obliterate it. To cede complete decision-making power to lock up a human to a statistical algorithm is in many ways unthinkable. Complete deference to statistical prediction in this or other contexts would almost certainly lead to the odd decision that at times we “know” is going to be wrong. Indeed, Paul Meehl long ago worried about the “case of the broken leg.” Imagine that a Super Cruncher is trying to predict whether individuals will go to the movies on a certain night. The Super Crunching formula might predict on the basis of twenty-five statistically validated factors that Professor Brown has an 84 percent probability of going to a movie next Friday night. Now suppose that we also learn that Brown has a compound leg fracture from an accident a few days ago and is immobilized in a hip cast.
Meehl understood that it would be absurd to rely on the actuarial prediction in the face of this new piece of information. By solely relying on the regression or even relegating the expert’s opinion to merely being an additional input to this regression, we are likely to make the wrong decision. A statistical procedure cannot estimate the causal impact of rare events (like broken legs) because there simply aren’t enough data concerning them to make a credible estimate. The rarity of the event doesn’t mean that it will not have a big impact when the event does in fact occur. It just means that statistical formulas will not be able to capture the impact. If we really care about making an accurate prediction in such circumstances, we need to have some kind of discretionary escape hatch—some way for a human to override the prediction of the formula.
The problem is that these discretionary escape hatches have costs too. “People see broken legs everywhere,” Snijders says, “even when they are not there.” The Mercury astronauts insisted on a literal escape hatch. They balked at the idea of being bolted inside a capsule that could only be opened from the outside. They demanded discretion. However, it was discretion that gave Liberty Bell 7 astronaut Gus Grissom the opportunity to panic upon splashdown. In Tom Wolfe’s memorable account, Grissom “screwed the pooch” when he prematurely blew the seventy explosive bolts securing the hatch before the Navy SEALs were able to secure floats. The space capsu
le sank and Gus nearly drowned.
System builders must carefully consider the costs as well as the benefits of delegating discretion. In context after context, decision makers who wave off the statistical predictions tend to make poorer decisions. The expert override doesn’t do worse when a true broken leg event occurs. Still, experts are overconfident in their ability to beat the system. We tend to think that the restraints are useful for the other guy but not for us. So we don’t limit our overrides to the clear cases where the formula is wrong; we override where we think we know better. And that’s when we get in trouble. Parole and Civil Commitment boards that make exceptions to the statistical algorithm and release inmates who are predicted to have a high probability of violence tend time and again to find that the high probability parolees have higher recidivism rates than those predicted to have a low probability. Indeed, in Virginia only one man out of the dozens civilly committed under the SVPA has ever been subsequently released by a judge who found him—notwithstanding his RRASOR score—no longer to be a risk to society. Once freed, this man abducted and sodomized a child and now is serving a new prison sentence.
There is an important cognitive asymmetry here. Ceding complete control to a statistical formula inevitably will give rise to the intolerable result of making some decisions that reason tells must be wrong. The “broken leg” hypothetical is cute, but unconditional adherence to statistical formulas will lead to powerful examples of tragedy—organs being transplanted into people we know can’t use them. These rare but salient anecdotes will loom large in our consciousness. It’s harder to keep in mind evidence that discretionary systems, where experts are allowed to override the statistical algorithms, tend to do worse.
What does all this mean for human flourishing? If we care solely about getting the best decisions overall, there are many contexts where we need to relegate experts to mere supporting roles in the decision-making process. We, like the Mercury astronauts, probably can’t tolerate a system that forgoes any possibility of human override. At a minimum, however, we should keep track of how experts fare when they wave off the suggestions of the formulas. The broken leg hypothetical teaches us that there will, of course, be unusual circumstances where we’ll have good reason for ignoring statistical prediction and going with what our gut and our reason tell us to do. Yet we also need to keep an eye on how often we get it wrong and try to limit our own discretion to places where we do better than machines. “It is critical that the level, type, and circumstances of over-ride usage be monitored on an ongoing basis,” University of Massachusetts criminologists James Byrne and April Pattavina wrote recently. “A simple rule of thumb for this type of review is to apply a 10 percent rule: if more than 10 percent of the agency’s risk scoring decisions are being changed, then the agency has a problem in this area that needs to be resolved.” They want to make sure that override is limited to fairly rare circumstances. I’d propose instead that if more than half of the overrides are getting it wrong, then humans, like the Mercury astronauts, are overriding too much.
This is in many ways a depressing story for the role of flesh-and-blood people in making decisions. It looks like a world where human discretion is sharply constrained, where humans and their decisions are controlled by the output of machines. What, if anything, in the process of prediction can we humans do better than the machines?
What’s Left for Us to Do?
In a word, hypothesize. The most important thing that is left to humans is to use our minds and our intuition to guess at what variables should and should not be included in statistical analysis. A statistical regression can tell us the weights to place upon various factors (and simultaneously tell us how precisely it was able to estimate these weights). Humans, however, are crucially needed to generate the hypotheses about what causes what. The regressions can test whether there is a causal effect and estimate the size of the causal impact, but somebody (some body, some human) needs to specify the test itself.
Consider, for example, the case of Aaron Fink. Fink was a California urologist and an outspoken advocate of circumcision. (He was the kind of guy who self-published a book to promote his ideas.) In 1986, the New England Journal of Medicine published a letter of his which proposed that uncircumcised men would be more susceptible to HIV infection than circumcised men. At the time, Fink didn’t have any data, he just had the idea that the cell of the prepuce (the additional skin on an uncircumcised male) might be susceptible to infection. Fink also noticed that countries like Nigeria and Indonesia where only about 20 percent of men are uncircumcised had a slower spread of AIDS than in countries like Zambia and Thailand where 80 percent of men are uncircumcised. Seeing through the sea of data to recognize a correlation that had eluded everyone else was a stroke of brilliance.
Before Fink died in 1990, he was able to see the first empirical verification of his idea. Bill Cameron, an AIDS researcher in Kenya, hit upon a powerful test of Fink’s hypothesis. Cameron and his colleagues found 422 men who visited prostitutes in Nairobi, Kenya, in 1985 (85 percent of these prostitutes were known to be HIV positive) and subsequently went to a clinic for treatment of a non-HIV STD. Like Ted Ruger’s Supreme Court study, Cameron’s test was prospective. Cameron and his colleagues counseled the men on HIV, STDs, and condom use and asked them to refrain from further prostitute contact. The researchers then followed up with these men on a monthly basis for up to two years, to see if and under what circumstances the men became HIV positive. Put simply, they found that uncircumcised men were 8.2 times more likely to become HIV positive than circumcised men.
This small but powerful study triggered a cascade of dozens of studies confirming the result. In December 2006, the National Institutes of Health stopped two randomized trials that it was running in Kenya and Uganda because it became apparent that circumcision reduced a man’s risk of contracting AIDS from heterosexual sex by about 65 percent. Suddenly, the Gates Foundation is considering paying for circumcision in high risk countries.
What started as a urologist’s hunch may end up saving hundreds of thousands of lives. Yes, there is still a great role for deductive thought. The Aristotelian approach to knowledge remains important. We still need to theorize about the true nature of things, to speculate. Yet unlike the old days, where theorizing was an end in itself, the Aristotelian approach will increasingly be used at the beginning as an input to statistical testing. Theory or intuition may lead the Finks of the world to speculate that X and Y cause Z. But Super Crunching (by the Camerons) will then come decisively into play to test and parameterize the size of the impact.
The role of theory in excluding potential factors is especially important. Without theory or intuition, there is a literal infinity of possible causes for any effect. How are we to know that what a vintner had for lunch when he was seven doesn’t affect who he might fall in love with or how quaffable his next vintage might be? With finite amounts of data, we can only estimate a finite number of causal effects. The hunches of human beings are still crucial in deciding what to test and what not to test.
The same is even more true for randomized testing. People have to figure out in advance what to test. A randomized trial only gives you information about the causal impact of some treatment versus a control group. Technologies like Offermatica are making it a lot cheaper to test dozens of separate treatments’ effects. Yet there’s still a limit to how many things can be tested. It would probably be just a waste of money to test whether an ice-cream diet would be a healthy way to lose weight. But theory tells me that it might not be a bad idea to test whether financial incentives for weight loss would work.
So the machines still need us. Humans are crucial not only in deciding what to test, but also in collecting and, at times, creating the data. Radiologists provide important assessments of tissue anomalies that are then plugged into the statistical formulas. Same goes for parole officials who subjectively judge the rehabilitative success of particular inmates. In the new world of database decision making, these a
ssessments are merely inputs for a formula, and it is statistics, and not experts, that determine how much weight is placed on the assessments.
Albert Einstein said the “really valuable thing is intuition.” In many ways, he’s still right. Increasingly, though, intuition is a precursor to Super Crunching. In case after case, traditional expertise innocent of statistical analysis is losing out to unaided intuition. As Paul Meehl concluded shortly before he died:
There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one. When you are pushing over 100 investigations, predicting everything from the outcome of football games to the diagnosis of liver disease, and when you can hardly come up with a half dozen studies showing even a weak tendency in favor of the clinician, it is time to draw a practical conclusion.
It is much easier to accept these results when they apply to someone else. Few people are willing to accept that a crude statistical algorithm based on just a handful of factors could outperform them. Universities are loath to accept that a computer could select better students. Book publishers would be loath to delegate the final say in acquiring manuscripts to an algorithm.
At some point, however, we should start admitting that the superiority of Super Crunching is not just about the other guy. It’s not just about baseball scouts and wine critics and radiologists and…the list goes on and on. Indeed, by now I hope to have convinced you that something real is going on out there. Super Crunching is impacting real-world decisions in many different contexts that touch us as consumers, as patients, as workers, and as citizens.