Super Crunchers

Home > Other > Super Crunchers > Page 15
Super Crunchers Page 15

by Ian Ayres


  Mashups and mergers of datasets are easier today than ever before. But the DBT convict list is a cautionary tale. The new merging technology can fail through either inadvertent or advertent error. As the size of datasets balloons almost beyond the scope of our imagination, it becomes all the more important to continually audit them to check for the possibility of error. What makes the DBT story so troubling is that the convict/voter data seemed so poorly matched relative to the standards of modern-day merging and mashing.

  Technology or Techniques?

  The technological advances in the ability of firms to digitally capture and merge information have helped trigger the commodification of data. You’re more likely to want to pay for data if you can easily merge them into your pre-existing database. And you’re more likely to capture information if you think someone else is later going to pay you for it. So the ability of firms to more easily capture and merge information helps answer the “Why Now?” question.

  At heart, the recent onslaught of Super Crunching is more a story of advances in technology, not statistical techniques. It is not a story about new breakthroughs in the statistical art of prediction. The basic statistical techniques have existed for decades—even centuries. The randomized trials that are being used so devastatingly by companies like Offermatica have been known and used in medicine for years. Over the last fifty years, econometrics and statistical theory have improved, but the core regression and randomization techniques have been available for a long time.

  What’s more, the timing of the Super Crunching revolution isn’t dominantly about the exponential increase in computational capacity. The increase in computer speed has helped, but the increase in computer speed came substantially before the rise of data-based decision making. In the old days, say, before the 1980s, CPUs—“central processing units”—were a real constraint. The number of mathematical operations necessary to calculate a regression goes up exponentially with the number of variables—so if you double the number of controls, you roughly quadruple the number of operations needed to estimate a regression equation. In the 1940s, the Harvard Computation Laboratory employed scores of secretaries armed with mechanical calculators who manually crunched the numbers behind individual regressions. When I was in grad school at MIT in the 1980s, CPUs were so scarce that grad students were only allotted time in the wee hours of the morning to run our programs.

  But thanks to Moore’s Law—the phenomenon that processor power doubles every two years—Super Crunching has not been seriously hampered by a lack of cheap CPUs. For at least twenty years, computers have had the computation power to estimate some serious regression equations.

  The timing of the current rise of Super Crunching has been impacted more by the increase in storage capacity. We are moving toward a world without delete buttons. Moore’s Law is better known, but it is Kryder’s Law—a regularity first proposed by the chief technology officer for hard drive manufacturer Seagate Technology, Mark Kryder—that is more responsible for the current onslaught of Super Crunching. Kryder successfully noticed that the storage capacity of hard drives has been doubling every two years.

  Since the introduction of the disk drive in 1956, the density of information that can be recorded into the space of about a square inch has swelled an amazing 100-million fold. Anyone over thirty remembers the days when we had to worry frequently about filling up our hard disks. Today, the possibility of cheap data storage has revolutionized the possibility for keeping massively large datasets.

  And as the density of storage has increased, the price of storage has dropped. Thirty to forty percent annual price declines in the cost per gigabyte of storage continue apace. Yahoo! currently records over twelve terabytes of data daily. On the one hand, this is a massive amount of information—it’s roughly equivalent to more than half the information contained in all the books in the Library of Congress. On the other hand, this amount of disk storage does not require acres of servers or billions of dollars. In fact, right now you could add a terabyte of hard drive to your desktop for about $400. And industry experts predict that in a couple of years that price will drop in half.

  The cutthroat competition to produce these humongous hard drives for personal consumer products is driven by video. TiVo and other digital video recorders can only remake the world of home video entertainment if they have adequate storage space. A terabyte drive will only hold about eight hours of HDTV (or nearly 14,000 music albums), but you can jam onto it about sixty-six million pages of text or numbers.

  Both the compactness and cheapness of storage are important for the proliferation of data. Suddenly, it’s feasible for Hertz or UPS to give each employee a handheld machine to capture and store individual transaction data that are only periodically downloaded to a server. Suddenly, every car includes a flash memory drive, a mini-black box recorder to tell what was happening at the time of an accident.

  The abundance of supercheap storage from the tiny flash drives (hidden inside everything from iPods and movie cameras to swimming goggles and birthday cards) to the terabyte server farms at Google and flickr.com have opened up new vistas of data-mining possibilities. The recent onslaught of Super Crunching is dominantly driven by the same technological revolutions that have been reshaping so many other parts of our lives. The timing is best explained by the digital breakthroughs that make it cheaper to capture, to merge, and to store huge electronic databases. Now that mountains of data exist (on hard disks) to be mined, a new generation of empiricists is emerging to crunch it.

  Can a Computer Be Taught to Think Like You?

  There is, though, one new statistical technique that is an important contributor to the Super Crunching revolution: the “neural network.” Predictions using neural network equations are a newfangled competitor to the tried-and-true regression formula. The first neural networks were developed by academics to simulate the learning processes of the human brain. There’s a great irony here: the last chapter detailed scores of studies showing why the human brain does a bad job of predicting. Neural networks, however, are attempts to make computers process information like human neurons. The human brain is a network of interconnected neurons that act as informational switches. Depending on the way the neuron switches are set, when a particular neuron receives an impulse, it may or may not send an impulse on to a subsequent set of neurons. Thinking is the result of particular flows of impulses through the network of neuron switches. When we learn from some experience, our neuron switches are being reprogrammed to respond differently to different types of information. When a curious young child reaches out and touches a hot stove, her neuron switches are going to be reprogrammed to fire differently so the next time the hot stove will not look so enticing.

  The idea behind computer neural networks is essentially the same: computers can be programmed to update their responses based on new or different information. In a computer, a mathematical “neural network” is a series of interconnected switches that, like neurons, receive, evaluate, and transmit information. Each switch is a mathematical equation that takes and weighs multiple types of input information. If the weighted sum of the inputs in the equation is sufficiently large, the switch is turned on and is sent as informational input for subsequent neural equation switches. At the end of the network is a final switch that collects information from previous neural switches and produces as its output the neural network’s prediction. Unlike the regression approach, which estimates the weights to apply to a single equation, the neural approach uses a system of equations represented by a series of interconnected switches.

  Just as experience trains our brain’s neuron switches when to fire and when not to, computers use historical data to train the equation switches to come up with optimal weights. For example, researchers at the University of Arizona constructed a neural network to forecast winners in greyhound dog racing at the Tucson Greyhound Park. They fed in more than fifty pieces of information from thousands of daily racing sheets—things like the dogs’ physical
attributes, the dogs’ trainers, and, of course, how the dogs did in particular races under particular conditions. Like the haphazard predictions of the curious young child, the weights on these greyhound racing equations were initially set randomly. The neural estimation process then tried out alternative weights on the same historic data over and over again—sometimes literally millions of times—to see which weights for the interconnecting equations produced the most accurate estimates. The researchers then used the weights from this training to predict the outcome of a hundred future dog races.

  The researchers even set up a contest between their predictions and three expert habitués of the racetrack. For the test races, the neural network and the experts were each instructed to place $1 bets on a hundred different dogs. Not only did the neural network better predict the winners, but (more importantly) the network’s predictions yielded substantially higher payoffs. In fact, while none of the three experts generated positive payoffs with their predictions—the best still lost $60—the neural network won $125. It won’t surprise you to learn that lots of other bettors are now relying on neural prediction (if you google neural network and betting, you’ll get tons of hits).

  You might be wondering what’s really new about this technique. After all, plain-old regression analysis also involves using historical data to predict results. What sets the neural network methodology apart is its flexibility and nuance. With traditional regressions, the Super Cruncher needs to specify the specific form of the equation. For example, it’s the Super Cruncher who has to tell the machine whether or not the dog’s previous win percentage needs to be multiplied by the dog’s average place in a race in order to produce a more powerful prediction.

  With neural networks, the researcher just needs to feed in the raw information, and the network, by searching over the massively interconnected set of equations, will let the data pick out the best functional form. We don’t have to figure out in advance how dogs’ different physical attributes interact to make them better racers; we can let the neural training tell us. The Super Cruncher, under either the regression or neural methods, still needs to specify the raw inputs for the prediction. The neural method, however, allows much more fluid estimates of the nature of the impact. As the size of the datasets has increased, it has become possible to allow neural networks to estimate many, many more parameters than have traditionally been accommodated by traditional regression.

  But the neural network is not a panacea. The subtle interplay of its weighting schemes is also one of its biggest drawbacks. Because a single input can influence multiple intermediate switches that in turn impact the final prediction, it often becomes impossible to figure out how an individual input is affecting the predicted outcome.

  Part and parcel of not knowing the size of the individual influences is not knowing the precision of the neural weighting scheme. Remember, the regression not only tells you how much each input impacts the prediction, it also tells you how accurately it was able to estimate the impact. Thus, in the greyhound example, a regression equation might not only tell you that the dog’s past win percentage should be given a weight of .47 but it would also tell you its level of confidence in that prediction: “There’s a 95 percent chance that the true weight is between .35 and .59.” The neural network, in contrast, doesn’t tell you the confidence intervals. So while the neural technique can yield powerful predictions, it does a poorer job of telling you why it is working or how much confidence it has in its prediction.

  The multiplicity of estimated weighting parameters (which can often be three times greater with neural networks than with regression prediction) can also lead toward “overfitting” of the training data.*3 If you have the network “train” itself about the best 100 weights to use on 100 pieces of historical data, the network will be able to precisely predict all 100 outcomes. But exactly fitting the past doesn’t guarantee that the neural weights will be good at predicting future outcomes. Indeed, the effort to exactly fit the past with a proliferation of arbitrary weights can actually hinder the ability of neural networks to predict the future. Neural Super Crunchers are now intentionally limiting the number of parameters that they estimate and the amount of time they let the network train to try to reduce this overfitting problem.

  “We Shoot Turkeys”

  To be honest, the neural prediction methods are sufficiently new that there’s still a lot of art involved in figuring out how best to estimate neural predictions. It’s not yet clear how far neural prediction will go in replacing regression prediction as the dominant methodology. It is clear, however, that there are real-world contexts in which neural predictions are at least holding their own with regression prediction as far as accuracy. There are even some cases where they outperform traditional regressions.

  Neural predictions are even starting to influence Hollywood. Just as Orley Ashenfelter predicted the price of Bordeaux vintages before they had even been tasted, a lawyer named Dick Copaken has had the audacity to think that he can figure out how much a movie will gross before a single frame is even shot. Copaken is a fellow Kansas Citian who graduated from Harvard Law School and went on to a very distinguished career as a partner in the Washington office of Covington & Burling. In the past, he’s crunched numbers for his legal clients. Years ago he commissioned Lou Harris to collect information on perceptions of car bumper damage. The statistical finding that most people couldn’t even see small dents in their bumpers convinced the Department of Transportation to allow manufacturers to use cheaper bumpers that would sustain at most imperceptible dents.

  Nowadays, Copaken is using a neural network to crunch numbers for a very different kind of client. After retiring from the practice of law, Dick Copaken founded a company that he named Epagogix (from the Aristotelian idea of inductive learning). The company has “trained” a neural network to try to predict a movie’s receipts based primarily on characteristics of the script. Epagogix has been working behind the scenes because most of its clients don’t want the world to know what it’s doing.

  But in a 2006 New Yorker article, Malcolm Gladwell broke the story. Gladwell first learned of Epagogix when he was giving a speech to the head honchos of a major film studio. Copaken told me it was the kind of “retreat where they essentially confiscate everybody’s BlackBerry and cell phone, move them to some off-campus site and for a few days they try to think the great thoughts…. And they usually invite some guru of the moment to come and speak with them and help them as they sort through their thinking. And this particular year it was Malcolm Gladwell.” Even though Gladwell’s job was to tell stories to the executives, he turned the tables and asked them to tell him about some idea that was going to reshape the way films are made and viewed in the next century. “The chairman of the board started to tell him,” Copaken said, “about…this company that does these neural network projections and they are really amazingly accurate. And then, although this was all supposed to be fairly hush-hush,…the head of the studio chimed in and began to give some specifics about just how accurate we were in a test that we had done for the studio.”

  The studio head was bragging about the results of a paradigm-shifting experiment in which Epagogix was asked to predict the gross revenues of nine motion pictures just based on their scripts—before the stars or the directors had even been chosen. What made the CEO so excited was that the neural equations had been able to accurately predict the profitability of six out of nine films. On a number of the films, the formula’s revenue prediction was within a few million dollars of the actual gross.

  Six out of nine isn’t perfect, but traditionally studios are only accurate on about a third of their predictions of gross revenues. When I spoke with Copaken, he was not shy about putting dollars to this difference. “For the larger studios, if they both had the benefit of our advice and the discipline to adhere to it,” he said, “they could probably net about a billion dollars or more per studio per year.” Studios are effectively leaving a billion dollars a year on the grou
nd.

  Several studios have quickly (but quietly) glommed on to Epagogix’s services. The studios are using the predictions to figure out whether it’s worth spending millions of dollars to make a movie. In the old days, the phrase “to shoot a turkey” meant to make a bad picture. When someone at Epagogix says, “We shoot turkeys,” it means just the opposite. They prevent bad pictures from ever coming into existence.

  Epagogix’s neural equations have also let studios figure out how to improve the expected gross of a film. The formula not only tells you what to change but tells you how much more revenue the change is likely to bring in. “One time they gave us a script that just had too many production sites,” Copaken said. “The model told me the audience was going to be losing its place. By moving the action to a single city, we predicted that they would increase revenues and save on production costs.”

  Epagogix is now working with an outfit that produces about three to four independent films a year with budgets in the $35–50 million range. Instead of just reacting to completed scripts, Epagogix will be helping from the get-go. “They want to work with us in a collegial, collaborative fashion,” Copaken explained, “where we will work directly with their writers…in developing the script to optimize the box office.”

  But studios that want to maximize profits also have to stop paying stars so much money. One of the biggest neural surprises is that most movies would make just as much money with less established (and therefore less expensive) actors. “We do take actors and directors into account,” Copaken says. “It turns out to be a surprisingly small factor in terms of its overall weighting in the box office results.” It matters a lot where the movie is set. But the name of the stars or directors, not so much. “If you look at the list of the 200 all-time best-grossing movies,” Copaken says, “you will be shocked at how few of them have actors who were stars at the time those films were released.”

 

‹ Prev