by Ian Ayres
Old habits, like the star system, die hard. Copaken says that studio execs “still aren’t listening” when it comes to cutting the ad budget or substituting lesser-named actors. The neural equation says that stars and ads often aren’t worth what they cost. Copaken points out, “Nobody really knew about Harrison Ford until after Star Wars.”
Epagogix isn’t on any kind of crusade to hurt stars. In fact, the powerful Endeavor Agency is interested in using Epagogix’s services for its own clients. Copaken recently spent a morning with one of Endeavor’s founders, the indomitable Ari Emmanuel. Ari Emmanuel is apparently the inspiration for the character Ari Gold, in the HBO series Entourage. “He drove me to a couple of major studios to meet with the head of Paramount and the head of Universal,” Copaken said. “En route, he must have fielded seventy phone calls, from everybody from Sacha Baron Cohen to Mark Wahlberg to an agent for Will Smith.” Endeavor thinks that Epagogix can not only help its clients decide whether to agree to act in a movie, but it could also help them decide whether to get their money up front or roll the dice on participating in back-end profits. In an Epagogix world, some stars may ultimately be paid less, but the savvy stars will know better how to structure their contracts.
It shouldn’t surprise you, however, that many players in the industry are not receptive to the idea of neural prediction. Some studios are utterly closed-minded to the idea that statistics could help them decide whether to greenlight a project. Copaken tells the extraordinary story of bringing two hedge fund managers to meet with a studio head. “These hedge fund guys had raised billions of dollars,” Copaken explained, “and they were prepared to start with $500 million to fund films that would pass muster by our test and be optimized for box office. And obviously if the thing worked, they were prepared to expand rapidly. So there was a huge source of money on the table there.” Copaken thought that he could at least pique the studio’s interest with all of that outside money.
“But the meeting was not going particularly well and there was just a lot of resistance to this new way of thinking,” Copaken said. “And finally one of these hedge fund guys sort of jumped into the discourse and said, ‘Well, let me ask you a question. If Dick’s system here gets it right fifty times out of fifty times, are you telling me that you wouldn’t take that into account to change the way you decide which movies to make or how to make them?’ And the guy said, ‘No, that’s absolutely right. We would not even if he were right fifty times out of fifty times…. [S]o what if we are leaving a billion dollars of the shareholders’ money on the table; that is shareholders’ money…. Whereas if we change the way we do this, we might antagonize various people. We might not be invited. Our wives wouldn’t be invited to the parties. People would get pissed at us. So why mess with a good thing?’”
Copaken was completely depressed when he walked out of the meeting, but when he looked over he noticed that the hedge fund guys were grinning from ear to ear. He asked them why they were so happy. They told him, “You don’t understand, Dick. We make our fortunes by identifying small imperfections in the marketplace. They are usually tiny and they are usually very fleeting and they are immediately filled by the efficiency of the marketplace. But if we can discover these things and we can throw massive resources at these opportunities, fleeting and small though they may be, we end up making lots of money before the efficiency of the marketplace closes out that opportunity. What you just showed us here in Hollywood is a ten-lane paved highway of opportunity. It’s like they are committed to doing things the wrong way and there seems to be so much energy in the culture and commitment to doing it the wrong way, it creates a fantastic opportunity for us that is much more durable and enduring than anything we’ve ever seen.”
The very resistance of some studios creates more of an opportunity for outsiders to come in and see if optimized scripts really do sell more tickets. Epagogix itself is putting its money where its mouth is. Copaken is planning to remake a movie that was a huge commercial disappointment. With the help of the neural network, he thinks just a few simple changes to the script could generate a twenty-three fold increase in the gross. Copaken’s lined up a writer and plans to commission a script to make just these changes. We may soon see whether a D.C. lawyer armed with reams of data can perform a kind of cinematographic alchemy.
The screenwriter William Goldman famously claimed that when it comes to picking movies, “Nobody, nobody—not now, not ever—knows the least goddamn thing about what is or isn’t going to work at the box office.” And maybe nobody does. Studio execs, even after years of trial and error, have trouble putting the right weights on the components of a story. Unlike machines, they can emotionally experience a story, but this emotion is a double-edged sword. The relative success of Epagogix’s equations stem, in part, from its dispassionate weighting of what works.
Why Not Now?
Teasing out the development of technology and techniques helps explain why the Super Crunching revolution didn’t happen earlier. Still, we should also ask the inverse question: why are some industries taking so long to catch the wave? Why have some decisions been resistant to data-driven thinking?
Sometimes the absence of Super Crunching isn’t a problem of foot-dragging or unreasonable resistance. There are loads of decisions about which there just isn’t adequate historical data to do any kind of statistical test, much less a Super Crunch. Should Google buy YouTube? This kind of one-off question is not readily amenable to data-driven thinking. Super Crunching requires analysis of the results of repeated decisions. And even when there are repeated examples, it’s sometimes hard to quantify success. Law schools must decide every year which applicants to admit. We have lots of information about the applicants, and tons of data about past admitted students and the careers they’ve gone on to have. But what does it mean to be successful after graduation? The most obvious proxy, salary, isn’t a great indicator; a leader in government or public interest law might have a relatively low salary, but still make us proud. If you can’t measure what you’re trying to maximize, you’re not going to be able to rely on data-driven decisions.
Nonetheless, there are still many areas where metrics of success and plentiful historical data are just waiting to be mined. While data-driven thinking has been on the rise throughout society, there are still plenty of pockets of resistance that are ripe for change.
There’s almost an iron-clad law that it’s easier for people to warm up to applications of Super Crunching outside of their own area of expertise. It’s devilishly hard for traditional, non-empirical evaluators to even consider the possibility that quantified predictions might do a better job than they can do on their own home turf. I don’t think this is primarily because of blatant self-interest in trying to keep our jobs. We humans just overestimate our ability to make good decisions and we’re skeptical that a formula that necessarily ignores innumerable pieces of information could do a better job than we could.
So let’s turn the light onto the process of publishing books itself. Couldn’t Super Crunching help Bantam or its parent, Random House, Inc., decide what to publish? Of course not. Book publishing is too much of an art to be susceptible to Super Crunching. But let’s start small. Remember, I already showed how randomized trials helped test titles for this book. Why can’t a regression help choose at least the title of books? Turns out Lulu.com has already run this regression. They estimated a regression equation to help predict whether a book’s title is going to be a best-seller.
Atai Winkler, a British statistician, created a dataset on the sales of every novel to top the New York Times Bestseller List from 1955 to 2004 together with a control group of less successful novels by the same authors. With more than 700 titles, he then estimated a regression to predict the likelihood of becoming a best-seller. The regression tested for the impact of eleven different characteristics (Is the title in the form “The——of——”? Does the title include the name of a person or place? Does it begin with a verb?).
It t
urns out that figurative titles are more likely to produce best-sellers than are literal ones. It also matters whether the first word of a title is a verb, pronoun, or exclamation. And, contrary to publishing wisdom, shorter isn’t necessarily better: a title’s length does not significantly affect book sales. All told, the regression produced predictions that were much better than random guesses. “It guessed right in nearly 70 percent of cases,” Winkler said. “Given the nature of the data and the way tastes change, this is very good.” But Winkler didn’t want to over-claim. “Whether a book gets to the best-seller list,” he said, “depends a lot on the other books that happen to be there that week on the list. Only one of them could be the best-seller.”
The results aren’t perfect. While Agatha Christie’s Sleeping Murder claimed the top spot among all of the titles Winkler analyzed, the model predicted that The Da Vinci Code had only a 36 percent chance of becoming a best-seller.
Even with its flaws, this is a web application that’s both fun and a bit addictive. Just type in your proposed title at Lulu.com/titlescorer and bam, the applet gives you a prediction of success for any title you might imagine. You can even use the “Titlefight” feature to pit two competing title ideas against each other. Of course, this isn’t really a test of whether your book will become a best-seller. It is a test of whether the title of someone like Jane Smiley will take off or not. Yet even if you’ve never had a book at the top of the best-sellers list, wouldn’t you want to know how your title scored? (I did. Even though the book is nonfiction, Super Crunchers predicted a 56.8 percent chance of success. From Lulu’s lips to God’s ears.)
But why stop at the title of the book? Why not crunch the content?
My first reaction is again, nah, that would never work. It’s impossible to code whether a book is well written. But this might just be the iron law of resistance talking. Beware of the person who says, “You could never quantify what I do.”
If Epagogix’s analysis of plots can predict movie sales, why couldn’t an analysis of plots help predict novel sales? Indeed, novels should be even easier to code because you don’t have the confounding influences of temperamental actors and the possibility of botched or beautiful cinematography. The text is all there is. You might even be able to start by coding the very same criteria that Epagogix uses for movie scripts. The economic criteria for success also exist in abundance. Nielsen BookScan provides its subscribers with weekly point-of-sale data on how well books are selling at most major book retailers. So there are tons of data on sales success just waiting to be crunched. Instead of crudely predicting the probability of whether you hit the top of the best-seller list or not, you could try to predict the total sales based on a lot more than just the title.
Yet no one in publishing is rushing to be the first on the block to publicly use number crunching to choose what books to buy or how to make them better. A large part of me viscerally resists the idea that a nonfiction book could be coded or that Super Crunching could improve the content of this book. But another part of me has in fact already data mined a bit on what makes for success in nonfiction publishing.
As a law professor, my primary publishing job is to write law review articles. I don’t get paid for them, but a central measure of an article’s success is the number of times the articles have been cited by other professors. So with the help of a full-time number-crunching assistant named Fred Vars, I went out and analyzed what caused a law review article to be cited more or less. Fred and I collected citation information on all the articles published for fifteen years in the top three law reviews. Our central statistical formula had more than fifty variables. Like Epagogix, Fred and I found that seemingly incongruous things mattered a lot. Articles with shorter titles and fewer footnotes were cited significantly more, whereas articles that included an equation or an appendix were cited a lot less. Longer articles were cited more, but the regression formula predicted that citations per page peak for articles that were a whopping fifty-three pages long. (We law professors love to gas on about the law.)
Law review editors who want to maximize their citation rates should also avoid publishing criminal and labor law articles, and focus instead on constitutional law. And they should think about publishing more women. White women were cited 57 percent more often than white men, and minority women were cited more than twice as often. The ultimate merit of an article isn’t tied to the race or gender of the author. Yet the regression results suggest that law review editors should think about whether they have been unconsciously setting the acceptance bar unfairly high for women and minority authors whose articles, when published, are cited systematically more often.
Law review editors of course are likely to resist many of these suggestions. Not because they’re prima donnas (although believe me, some are), but just because they’re human.
A Store of Information
Some long ago when we were taught
That for whatever kind of puzzle you got
You just stick the right formula in
A solution for every fool.
“LEAST COMPLICATED,” INDIGO GIRLS
We don’t want to be told what to do by a hard-edged and obviously incomplete equation. Something there is that doesn’t love a formula. Equations, like Robert Frost’s walls, limit our freedom to go where we want.
With a little prodding, however, some of our most coveted assessments may yield to the reason of evidence. If this book has succeeded in convincing you that we humans do a poor job in figuring out how much weight to put on various factors when making predictions, then you should be on the lookout for areas in your own job and in your own life where Super Crunching could help.
Stepping back, we can see that technological constraints to data-driven decision making have fallen across the board. The ability to digitalize and store information means that any laptop with access to the Internet can now access libraries several times the size of the library of Alexandria. Computational techniques and fast computers to make the computations were of course necessary, but both regressions and CPUs were in place well before the phenomenon seriously took off. I’ve suggested here that it is instead our increasing facility with capturing, merging, and storing digital data that has more to do with the current onslaught. It is these breakthroughs in database technology that have also facilitated the commodification of information. Digital data now has market value and it is coalescing into huge data warehouses.
There’s no reason to think that the advances in database technology will not continue. Kryder’s Law shows no sign of ending. Mashups and merger techniques are becoming automated. Data-scraping programs of the future will not only search the web for new pieces of information but will also automatically seek out the merging equivalents of a Rosetta stone to pair observations from disparate datasets. Increasingly predictive Super Crunching techniques will help mash up the observations from disconnected data.
And maybe most importantly, we should see continued advances in the digital domain’s ability to capture information—especially via miniaturized sensors. The miniaturization of electronic sensors has already spurred the capture of all sorts of data. Cell phones are ready to pinpoint owners’ whereabouts, purchase soda, or digitally memorialize an image. Never before have so many people had in their pocket an ever-present means to record pictures.
But in the not-too-distant future, nanotechnology may spur an age of “ubiquitous surveillance” in which sensing devices become ever more pervasive in our society. Where retailers now keep track of inventory and sales through collection of data at the checkout scanner, nanotechnology may soon allow them to insert small sensors directly into the product. Nanosensors could keep track of how long you hold on to a particular product before using it, how far you transport it, or whether you will likely use the product in conjunction with other products. Of course, consumers would need to consent to product sensors. But there is no reason to limit the application of nanosensors to embedding them in other objects or clothing. Ins
tead, we may find ourselves surrounded by “smart dust”: nanosensors that are free-floating and truly ubiquitous within a particular environment. These sensors might quite literally behave like dust; they would flow through the breeze and, at a size of one cubic millimeter, be nearly undetectable.
The prospect of pervasive digitalization of information is both exciting and scary. It is a cautionary tale for a world without privacy. Indeed, we have seen here several worrisome stories. Poor matching in Florida might have mistakenly purged thousands of African-American voters. Even the story of Epagogix rankles. Isn’t art supposed to be determined by the artist? Isn’t it better to accept a few cognitive foibles, but to retain more humane environments for creative flourishing? Is Super Crunching good?
CHAPTER 7
Are We Having Fun Yet?
Sandra Kay Daniel, a second-grade teacher at the Emma E. Booker Elementary School in Sarasota, Florida, sits in front of about a dozen second graders. She is a middle-aged matronly African-American woman with a commanding but encouraging voice.
Open your book up to lesson sixty on page 153. And at the count of three. One…. Two…Three. Everyone should be on page 153. If the yellow paper is going to bother you, drop it. Thank you. Everyone touch the title of your story. Fingers under the title. Get ready to read the title…. The…Fast…Way. We’re waiting for one member. Thank you. Fingers under the title of the story. Get ready!