Book Read Free

Super Crunchers

Page 19

by Ian Ayres


  Probabilistically Public

  The idea that a university or insurer could predict your race is itself just another way that Super Crunching is reducing our sphere of effective privacy. Suddenly we live in a world where less and less is hidden about who we are, what we have done, and what we will do.

  Part of the privacy problem isn’t a problem of Super Crunching; it’s the dark side of digitalization. Information is not only easier to capture now in digital form, but it is also virtually costless to copy. It’s scary to live in a world where ChoicePoint and other data aggregators know so much about us. There is the legitimate fear that information will leak. In May 2006, this fear became real for more than 17.5 million military veterans when electronic records containing their Social Security numbers and birth dates were stolen from a government employee’s home. The government told veterans to be “extra vigilant” when monitoring bank and credit card records, but the risk of identity theft remains. And this risk is not just limited to bureaucratic mishaps. A laptop was stolen from the home of a Fidelity Investments employee and—poof, the personal information of 196,000 HP employees was suddenly up for grabs. Or an underling at AOL hit the wrong key and poof, the personal search information of millions of users was released onto the net.

  We’re used to giving the phone company or websites passwords or answers to challenge questions so that they can verify that we are who we say we are when we call in. But today, new services like Corillian are providing retailers with challenge questions and answers that you have never provided, in a matter of seconds. You might walk into Macy’s to apply for a credit card and while you’re waiting, Macy’s will ask you where your mom lived in 1972. Statistical matching algorithms let Super Crunchers connect widely disparate types of data and ferret out facts in nanoseconds that would have taken weeks of effort in the past.

  Our privacy laws so far have been mainly concerned with protecting people’s privacy in their homes and curtilage (the enclosed area in land surrounding your house). To Robert Frost, home was the place where, “when you have to go there, they have to take you in.” To the Constitution, however, home is quintessentially the place where you have a “reasonable expectation of privacy.” Out on the street, the law says we don’t expect our actions to be private and the police are free without a warrant, say, to listen in on our conversations.

  The law in the past didn’t need to worry much about our walking-around privacy, because out on the street we were usually effectively anonymous. Donald Trump may have difficulty going for a walk incognito in New York, but most of us could happily walk across the length and breadth of Manhattan without being recognized.

  Yet the sphere of public anonymity is shrinking. With just a name, we can google people on the fly to pull up addresses, photographs, and myriad pieces of other information. And with face-recognition software, we don’t even need a name. It will soon be possible to passively identify passersby. The first face-recognition applications were operated by police—looking for people with outstanding warrants at the Super Bowl. In Massachusetts, police recently were able to use face-recognition software to catch Robert Howell, a fugitive featured on the television show America’s Most Wanted. Law enforcement officials tracked him down after they used his mug shot from the TV show to find a match in a database of over nine million digital driver’s license photos. Although Howell had managed to obtain a driver’s license under a different name, facial recognition software eventually caught up with him. This Super Crunching software is also being used to catch people who fraudulently apply for multiple licenses under different names. While not perfect, the database predictions were accurate enough to flag more than 150 twins as potential fraud cases.

  In the Steven Spielberg movie Minority Report, Tom Cruise’s character was bombarded with personalized electronic ads that recognized and called out to him as he walked through a mall. For the moment, this is still the stuff of science fiction. But we’re getting closer. Now passive identification is coming to the web. PolarRose.com is using face recognition to improve the quality of image searches. Google image searches currently rely on the text that appears near web images. PolarRose, on the other hand, creates 3-D renditions of the faces, codes for ninety different facial attributes, and then matches the image to an ever-growing database. Suddenly, if you happened to be walking by in the background when a tourist snaps a picture, the whole world could learn where you were. Any photo that is subsequently posted to websites like flickr.com could reveal your whereabouts.

  Most of the popular discussion of face recognition emphasizes the software that needs to successfully “code” different facial attributes. But make no mistake, facial recognition is Super Crunching looking for high-probability predictions. And once you’ve been identified, the Super Crunching cat is out of the bag, as increasing numbers of people will be able to determine what library books you forgot to return, which politicians you gave money to, what real estate you own, and countless other bytes of data about you. Walking into H&H and buying a bagel is technically public, but for most of us non-celebrities, public anonymity allowed for a vast range of unobserved freedom of movement. Super Crunching is reducing the sphere of being private in public.

  Sherlock Holmes was famous for deducing intricate details about a person’s past by observing just a few details of their present. But given access to much more intricate datasets, Super Crunchers can put Holmesian prediction to shame. Given these 250 variables, there’s a 93 percent chance that you voted for Nader. It’s elementary, my dear Watson.

  Data mining does more than give new meaning to “I know what you did last summer.” Super Crunchers can also make astute predictions about what you will do next summer. Traditionally, the right to privacy has been about preserving past and present information. There was no need to worry about keeping future information private. Since future information doesn’t exist yet, there was nothing to keep private. Yet data-mining predictions raise just this concern. Super Crunching in a sense puts our future privacy at risk because it can probabilistically predict what we will do. Super Crunching moves us toward a kind of statistical predeterminism.

  The 1997 sci-fi thriller Gattaca imagined a world in which genetics was destiny. The hero’s parents were told at his birth that he had a 42 percent chance of manic depression and life expectancy of 34.2 years. But right now it’s possible for Super Crunchers to look at a collection of innocuous past behaviors and make chillingly accurate assessments of the future. For example, it’s a little unnerving to think that Visa, with a little mining of my credit card charges, can make a pretty accurate guess of whether I’ll divorce in the next five years.

  “Data huggers”—the people who are scared about the untoward uses of public data—have a lot to be worried about. Google’s corporate mission is “to organize the world’s information and make it universally accessible and useful.” This ambitious goal is seductively attractive. However, it is not counterbalanced by any concern for privacy. Data-driven predictions are creating new dimensions where our past and even future actions will be “universally accessible.”

  The slow erosion of the private sphere makes it harder to realize what is happening and rally resistance. Like a frog slowly boiling to death, we don’t notice that our environment is changing. People in Israel now expect to be repeatedly checked by a metal detector as they go about their daily tasks. Sometimes incremental steps of “progress” take us down a path where collectively we end up eating hot-house tomatoes and Wonder Bread that are at best the semblance of food. The fear is that number crunching will somehow similarly degrade our lives.

  Newspaper reporters feel compelled to quote privacy pundits who raise concerns and call for debate. Yet most people, when it comes down to it, don’t seem to value their privacy very much. The Ponemon Institute estimates that only 7 percent of Americans change their behaviors to preserve their privacy. Rare is the person who will reject the EZ-pass system (and its discount) because it can track car movements.
Carnegie-Mellon economist Alessandro Acquisti has found that people are happy to surrender their Social Security number for just a fifty-cents-off coupon. Individually, we’re willing to sell our privacy. Sun’s founder and CEO Scott McNealy famously declared in 1999 that we “have no privacy—get over it.” Many of us already have.

  Super Crunching affects us not only as customers and as employees but also as citizens. I, for one, am not worried about people googling me or predicting my actions. The benefits of indexing and crunching the world’s information far outweigh its costs. Other citizens may reasonably disagree. One thing is for certain: consumer pressure by itself is not likely to restrain the Super Crunching onslaught. The data huggers of the world need to unite (and convince Congress) if the excesses of data-based decision making are going to be constrained.

  Truth is often a defense. But even true predictions at times may hurt customers and employees if they allow firms to take advantage of us, and predictions can hurt us as citizens if they allow others to inappropriately invade our past, present, or future privacy. The larger concern is about inaccurate (untrue) predictions. Without appropriate protections, they can hurt everybody.

  Who Is John Lott?

  On September 23, 2002, Mary Rosh posted to the web a rather harsh criticism of an empirical paper that I wrote with my colleague John Donohue. Rosh said:

  The Ayres and Donohue piece is a joke. I saw it a while ago…. A friend at the Harvard Law School said that Donohue gave the paper there and he was demolished…

  The article that Rosh was criticizing was about the impact of concealed handgun laws on crime. It was a response to John Lott’s “More Guns, Less Crime” claim. Lott created a huge dataset to analyze the impact that concealed weapon laws had on crime. His startling finding was that states which passed laws making it easy for law-abiding citizens to carry concealed weapons experienced a substantial decrease in crime. Lott believed that criminals would be less likely to commit crime if they couldn’t be sure whether or not their victims were armed.

  Donohue and I took Lott’s data and ran thousands of regressions exploring the same issue. Our article refuted Lott’s central claim. In fact, we found twice as many states that experienced a statistically significant increase in crime after passage of the law. Overall, however, we found that the changes were not substantial, and these concealed weapon laws might not impact crime one way or the other.

  That’s when Mary Rosh weighed in on the web. Her comment isn’t so remarkable for its content—that’s part of the rough and tumble of academic disputes. The comment is remarkable because Mary Rosh is really John Lott. Mary Rosh was a “sock puppet” pseudonym (based on the first two letters of his four sons’ names). Lott as Rosh posted dozens upon dozens of comments to the web praising his own merits and slamming the work of his opponents. Rosh, for example, identified herself as a former student of Lott’s and extolled Lott’s teaching. “I have to say that he was the best professor that I ever had,” she wrote. “You wouldn’t know that he was a ‘right-wing ideologue’ from the class.”

  Lott is a complicated and tortured soul. He is often the smartest guy in the room. He comes to seminars and debates consummately prepared. I first met him at the University of Chicago when I was delivering a statistical paper on New Haven bail bondsmen. Lott had not only read my paper carefully, he’d looked up the phone numbers of New Haven bond dealers and called them on the phone. I was blown away.

  He is incredibly combative in public, but just as soft-spoken, even meek, when speaking one-on-one. Lott is also a physical presence. He is tall and has striking features—Ichabod Crane–like in their lack of proportion. Mary Rosh has even described him:

  I had Lott as a teacher about a decade ago, and he has a quite noticable [sic] scar across his forehead. It looked like it cut right through his eyebrows going the entire width of his forehead. [T]he scar was so extremely noticable [sic] that people talked and joked about it. Some students claimed that he had major surgery when he was a child.

  Before the Mary Rosh dissembling, I was instrumental in bringing John to Yale Law School for two years as a research fellow. Make no mistake, John Lott has some serious number-crunching skills.

  His concealed-weapon empiricism was quickly picked up by gun-rights advocates and politicians as a reason to oppose efforts at gun control and advance the cause of greater freedom to carry guns. In the same year that Lott’s initial article was published, Senator Larry Craig (R-Idaho) introduced The Personal Safety and Community Protection Act, which was designed to facilitate the carrying of concealed firearms by nonresidents of a state who had obtained valid permits to carry such weapons in their home state. Senator Craig argued that the work of John Lott showed that arming the citizenry via laws allowing the carrying of concealed handguns would have a protective effect for the community at large because criminals would find themselves in the line of fire.

  Lott has repeatedly been asked to testify to state legislatures in support of concealed gun laws. Since Lott’s original article was published in 1998, nine additional states have passed his favored statute. This book is about the impact that Super Crunching is having on real-world decisions. It’s hard to know for sure whether Lott’s regressions were a but-for cause of these new statutes. Still Lott and his “More Guns/Less Crime” regressions have had the kind of influence that most academics can only dream of.

  Lott generously made his dataset available not only to Donohue and me, but to anyone who asked. And we dug in, double-checking the calculations and testing whether his results held up if we slightly changed some of his assumptions. Econometricians call this testing to see whether results are “robust.”

  We had two big surprises. First, we found that if you made innocuous changes in Lott’s regression equation, the crime-reducing impacts of Lott’s laws often vanished. More disturbingly, we found that Lott had made a computer mistake in creating some of his underlying data. For example, in many of his regressions, Lott tried to control for whether the crime took place in a particular region (say, the Northeast) in a particular year (say, 1988). But when we looked at his data, many of these variables were mistakenly set to zero. When we estimated his formula on the corrected data, we again found that these laws were more likely to increase the rate of crime.

  Let me stress that both of these mistakes are the kind of errors that, but for the grace of God, I or just about any other Super Cruncher might make—especially regarding the coding error. There are literally hundreds of data manipulations that need to be made in getting a large dataset in shape to run a regression. If the gearhead makes a mistake on any one of the transformations, the bottom-line predictions may be inaccurate. I have no concern that Lott purposefully miscoded his data to produce predictions that supported his thesis. Nonetheless, it is disturbing that after Donohue and I pointed out the coding errors, Lott and his coauthors continued to rely on the flawed data. As Donohue and I said in a response to our initial article, “repeatedly bringing erroneous data into the public debate starts suggesting a pattern of behavior that is unlikely to engender support for the Lott [‘More Guns/Less Crime’] hypothesis.”

  We are not the only ones to engage the topic. More than a dozen different authors have exploited the Lott data to reanalyze the issue. In 2004, the National Academy of Science entered into the debate, conducting a review of the empirical research of firearms and violent crime, including Lott’s work. Their panel of experts found: “There is no credible evidence that ‘right-to-carry’ laws, which allow qualified adults to carry concealed handguns, either decrease or increase violent crime.” At least for the moment, this pretty much sums up what many academics feel about the issue.

  Lott, however, fights on undaunted. Indeed, John is such a tenacious adversary that I’m a little scared to mention his name here in this book. In 2006, Lott took the extraordinary step of suing Steve Levitt for defamation, growing out of a single paragraph in Levitt’s bestselling Freakonomics book, which said in part:

&
nbsp; Lott’s admittedly intriguing hypothesis doesn’t seem to be true. When other scholars have tried to replicate his results, they found that right-to-carry laws simply don’t bring down crime.

  Levitt’s endnote supported this claim by citing…you guessed it, my article with Donohue that Mary Rosh thought was a joke. Lott’s defamation charge all depends on the meaning of “replicate.” Lott claims that Levitt was suggesting that Lott falsified his results—that he committed the cardinal sin of “editing the output file.” I find it shocking that Lott brought this suit, especially since Donohue and I couldn’t replicate some of his results once we corrected Lott’s clear coding error (coding errors, by the way, that Lott himself has conceded).

  Thankfully, the district court has dismissed the Freakonomics claim. Early in 2007, Judge Ruben Castillo found that the term “replicate” was susceptible to non-defaming meanings. The judge pointed to the same Ayres and Donohue endnote, saying that it clarified “the intended definition of the term ‘replicate’ to be simply that other scholars have disproved Lott’s gun theory, not that they proved Lott falsified his data.”

 

‹ Prev