Book Read Free

The Formula_How Algorithms Solve All Our Problems... and Create More

Page 21

by Luke Dormehl


  The problem with Anderson’s enthusiastic embrace of The Formula, of course, is his failure to realize that data mining, even on large datasets, is itself founded on a theory. As I have shown throughout this book, algorithms can often reflect the biases of their creators—based upon what it is that they deem to be important when answering a particular question. When an algorithm is created to determine what information is relevant, or the best way to solve a problem, this represents a hypothesis in itself. Even data is not free of human bias, from what data is collected to the manner in which it is cleaned up and made algorithm-ready. A computational process that seeks to sort, classify and create hierarchies in and around people, places, objects and ideas carries considerable political connotations. So too does a subject like the kind of user categorization I discussed in Chapter 1. What are the categories? Who belongs to each category? And how do we know that categories are there to help—rather than hinder—us?

  Can an Algorithm Defame?

  In March 2013, a T-shirt manufacturer called Solid Gold Bomb found itself in a heated row with Amazon over a slogan generated by an algorithm. Seizing upon the trend for “Keep Calm and . . .” paraphernalia sweeping the UK at the time, Solid Gold Bomb created a simple bot to generate similar designs by running through an English dictionary and randomly matching verbs with adjectives. In all, 529,493 similarly themed clothing items appeared on Amazon’s site—with the T-shirts only printed once a customer had actually bought one. As website BoingBoing wrote of the business plan: “It costs [Solid Gold Bomb] nothing to create the design, nothing to submit it to Amazon and nothing for Amazon to host the product. If no one buys it then the total cost of the experiment is effectively zero. But if the algorithm stumbles upon something special, something that is both unique and funny and actually sells, then everyone makes money.”21

  Unfortunately, no one at Solid Gold Bomb had apparently considered the possibility that the algorithm might generate offensive slogans while running down its available list of words. The variation that set Amazon’s blood boiling was the offensive “Keep Calm and Rape a Lot”—although had this not done the job, it would likely have been equally appalled by the misogynistic “Keep Calm and Hit Her,” or the ever-unpopular “Keep Calm and Grope On.” When it found out what was going on, Amazon responded by immediately pulling all of Solid Gold Bomb’s online inventory. The T-shirt manufacturer (which wound up going out of business several months later) was upset. Why, its owners wondered, should they be punished when the fault was not with any human agency—but with an algorithm, for whom the words in question meant nothing?

  A similar variation on this problem took place the previous year, when Google’s “auto-complete” algorithm came under fire for alleged defamation from Bettina Wulff, wife of the former German president Christian Wulff.22 Originally an algorithm designed to help people with physical disabilities increase their typing speed, auto-complete was added to Google’s functionality as a way of saving users time by predicting their search terms before they had finished typing them. “Using billions of searches, Google has prototyped an anonymous profile of its users,” says creator Marius B. Well. “This reflects the fears, inquiries, preoccupations, obsessions and fixations of the human being at a certain age and our evolution through life.”23 Type Barack Obama’s name into the Google search box, for example, and you would be presented with potentially useful suggestions including “Barack Obama,” “Barack Obama Twitter,” “Barack Obama quotes” and “Barack Obama facts.” Type in the name of United Kingdom deputy prime minister Nick Clegg, on the other hand, and you are liable to find “Nick Clegg is a prick,” “Nick Clegg is a liar,” “Nick Clegg is sad” and “Nick Clegg is finished.” Of these two camps, Bettina Wulff’s suggested searches fell more in line with Nick Clegg’s than Barack Obama’s. A person searching for Wulff’s name was likely to find search terms linking her to prostitution and escort businesses.24

  Realizing the effect that this was likely to have on someone searching for her name online, Wulff took Google to court and won. A German court decided that Google would have to ensure that the terms algorithmically generated by auto-complete were not offensive or defamatory in any way. Google was upset, claiming to be extremely “disappointed” by the ruling, since this impugned the supposed objective impartiality of its algorithms. “We believe that Google should not be held liable for terms that appear in auto-complete as these are predicted by computer algorithms based on searches from previous users, not by Google itself,” said a spokesperson for the company. “We are currently reviewing our options.” The problem is the amount, both figuratively and literally, that Google has fetishistically invested in its algorithmic vision. Like the concept of sci-fi author Philip K. Dick’s “minority reports” referenced in Chapter 3, if the algorithm proves fallible then tugging on this thread could have catastrophic results. “Google’s spiritual deferral to ‘algorithmic neutrality’ betrays the company’s growing unease with being the world’s most important information gatekeeper,” writes Evgeny Morozov in his book The Net Delusion. “Its founders prefer to treat technology as an autonomous and fully objective force rather than spending sleepless nights worrying about inherent biases in how their systems—systems that have grown so complex that no Google engineer fully understands them—operate.”25

  A narrative thread often explored in books and articles about Google is the degree to which Google’s rise has helped speed up the decline of traditional news outlets, like newspapers. In this sense, Google has displaced traditional media, even though it does not generate news stories itself. If Google’s algorithms ought to be subject to the same standards as newspapers, though, this poses some problems. In a classic study of newsroom objectivity, sociologist Gaye Tuchman observed that it was a fear of defamation that kept journalism objective. By reporting the views of others rather than relying on their own opinion, journalists were protected against allegations that they were biased in their reporting. In terms of Google’s auto-complete algorithm, it had also relied on quoting others rather than expressing opinions, since its suggested terms were based on the previous searches of thousands, or even millions, of users. By not censoring these searches, however, and keeping its algorithms apparently objective, Google had been accused of defamation.26

  Why Is Twitter Like a Newspaper? (And Why Isn’t Google?)

  The Bettina Wulff case marked several interesting developments. For one thing, it represented one of the first times that the politics of algorithms became the subject of a legal trial. Algorithms can be difficult to criticize. In contrast to the likes of a service such as Google Street View—whose arrival sparked street protests in some countries due to its perceived violation of privacy—the invisibility of an algorithm can make it tough to spot its effects. It is one thing to be able to log on to a server and see a detailed image of your house as taken from the end of your driveway. It is another to critique the inner workings of the algorithms that underpin companies such as Google, Facebook and Amazon. In the majority of cases, these algorithms are black-boxed in such a way that users have no idea how they work. Like the changing concept of transparency in the digital world (something I discussed in Chapter 3), often the idea that complex technology can work under a purposely simplistic interface is viewed by Silicon Valley decision-makers as a key selling point. Speaking about Google’s algorithms in 2008, Marissa Mayer—today the president and CEO of Yahoo!—had the following to say:

  We think that that’s the best way to do things. Our users don’t need to understand how complicated the technology and the development work that happens behind this is. What they do need to understand is that they can just go to a box, type what they want, and get answers.27

  Wulff’s concern about the politics of auto-correct was also proof positive of the power algorithms carry in today’s world in terms of changing the course of public opinion. By suggesting particular unflattering searches, a user with no preconceived views about Bettin
a Wulff could be channeled down a particular route. In this way, algorithms aren’t just predicting user behavior—they are helping dictate it. Consider, for example, Netflix’s recommendation algorithms, which I discussed in Chapter 4. Netflix claims that 60 percent of its rentals are done according to its algorithm’s suggestions, rather than users specifically searching for a title. If this is the case, are we to assume that the algorithm simply guessed what users would next want to search for, or that the users in fact made a certain selection because an algorithm had placed particular options in front of them?

  Here the question becomes almost irrelevant. As the sociologists William and Dorothy Thomas famously noted, “If men define situations as real, they are real in their consequences.” Or to put it in the words Kevin Slavin memorably used during his TED Talk, “How Algorithms Shape Our World,” the math involved in such computer processes has transitioned from “something that we extract and derive from the world, to something that actually starts to shape it.”28

  This can quite literally be the case. On September 6, 2008, an algorithm came dangerously close to driving United Airlines’ parent company UAL out of business. The problem started when a reporter for a news company called Income Securities Advisors entered the words “bankruptcy 2008” in Google’s search bar and hit “enter.” Google News immediately pointed the reporter to an article from the South Florida Sun-Sentinel, revealing that UAL had filed for bankruptcy protection. The reporter—who worked for a company responsible for feeding stories to the powerful Bloomberg news service—immediately posted an alert to the news network, lacking any further contextual information, entitled “United Airlines files for Ch. 11 to cut costs.” The news that the airline was seeking legal protection for its debtors was quickly read by thousands of influential readers of Bloomberg’s financial news updates. The problem, as later came to light, was that the news was not actually new, but referred to a bankruptcy filing from 2002, which the company had later successfully emerged from, thanks to a 2006 reorganization. Because the South Florida Sun-Sentinel failed to list a date with its original news bulletin, Google’s algorithms had assigned it one based upon the September 2008 date that its web-crawling software found and indexed the article. As a result of the misinformation, UAL stock trading on the NASDAQ plummeted from $12.17 per share to just $3.00, as panicked sellers unloaded 15 million shares within a couple of hours.29

  With algorithms carrying this kind of power, is it any real wonder we have started to rely on them to tell us what is important and what is not? In early 2009, a small town in France called Eu decided to change its name to one made up of a longer string of text—“Eu-en-Normandie” or “la Ville d’Eu”—because Google searches for “Eu” generated too many results for the European Union, colloquially known as the E.U. Consider also the Twitter-originated concept of incorporating hashtags (#theformula) into messages. A September 2013 comedy sketch on Late Night with Jimmy Fallon demonstrated how ill-suited the whole hashtag phenomenon is for meaningful communication in the real world. (“Check it out,” says one character, “I brought you some cookies. #Homemade, #OatmealRaisin, #ShowMeTheCookie.”) But of course the idea of hashtags isn’t to better explain ourselves to other people, but rather to allow us to modify our speech in a way that makes it more easily recognized and distributed by Twitter’s search algorithms.

  When the content of our speech is not considered relevant, it can prompt the same kind of reaction as a dating website’s algorithms determining there is no one well matched with you. During the Occupy Wall Street protests, many participants and supporters used Twitter to coordinate, debate and publicize their efforts. Even though the protests gained considerable media coverage, however, the term failed to “trend” according to Twitter’s algorithms—referring to the terms Twitter shows on its home page to indicate the most discussed topics, as indexed from the 250 million tweets sent every day. Although #OccupyWallStreet failed to trend, less pressing comedic memes like #WhatYouFindInLadiesHandbags and #ThingsThirstyPeopleDo seemingly had no difficulty showing up during that same time span.

  Although Twitter denied censorship, what is interesting about the user outcry is what it says about the cultural role we imbue algorithms with. “It’s a signal moment where the trending topics on Twitter are read as being an indication of the importance of different sorts of social actions,” says Paul Dourish, professor of informatics at the Donald Bren School of Information and Computer Sciences at the University of California, Irvine. Dourish likens trending to what appearing on the front page of the New York Times or the Guardian meant at a time when print newspapers were at the height of their power. To put it another way, Twitter’s trending algorithms are a technological reimagining of the old adage about trees falling in the woods with no one around to hear them. If a social movement like Occupy Wall Street doesn’t register on so-called social media, has it really taken place at all?

  This is another task now attributed to algorithms. In the same way that debates in the 20th century revolved around the idea of journalistic objectivity at the heart of media freedom, so too in the 21st century will algorithms become an increasingly important part of the objectivity conversation. Reappropriating free-speech advocate Alexander Meiklejohn’s famous observation, “What is essential is not that everyone shall speak, but that everything worth saying shall be said,” those in charge of the most relied-upon algorithms are given the jobs of cultural gatekeepers, tasked with deciding what is worth hearing and how to deal with that material which is not. A decision like algorithmically blocking those comments on a news site that have received a high ratio of negative comments versus positive ones might sound like a neat way of countering spam messages, but it also poses profound queries relating to freedom of speech.30

  What can be troubling about these processes is that bias can be easily hidden. In many cases, algorithms are tweaked on an ongoing basis, while the interface of a particular service might remain the same. “Often it’s the illusion of neutrality, rather than the reality of it,” says Harry Surden, associate professor at the University of Colorado Law School, whom I introduced in Chapter 3. To put it another way, many of us assume that algorithms are objective until they aren’t.

  “It’s very difficult to have an open and frank conversation about culture and what is valued versus what falls off the radar screen when most people don’t have a clear sense of how decisions are being made,” says scholar Ted Striphas. “That’s not to say that algorithms are undemocratic, but it does raise questions when it comes to the relationship between democracy and culture.” Examples of this cultural unknowability can be seen everywhere. In April 2009, more than 57,000 gay-friendly books disappeared from Amazon’s sales ranking lists, based on their erroneous categorization as “adult” titles. While that was a temporary glitch, the so-called #amazonfail incident revealed something that users had previously not been aware of: that the algorithm used to establish the company’s sales ranks—previously believed to be an objective measurement tool—was purposely designed to ignore books designated as adult titles. Similar types of algorithmic demotion can be seen all over the place. YouTube demotes sexually suggestive videos so that they do not appear on “Most Viewed” pages or the “Top Favorite” home page generated for new users. Each time Facebook changes its EdgeRank algorithm—designed to ensure that the “most interesting” content makes it to users’ News Feeds—there is an existential crisis whereby some content becomes more visible, while others are rendered invisible.31

  As was the case with Google’s “auto-correct” situation, sometimes these decisions are not left up to companies themselves, but to governments. It is the same kind of algorithmic tweaking that means that we will not see child pornography appearing in the results of search engines, that dissident political speech doesn’t appear in China, and that websites are made invisible in France if they promote Nazism. We might argue about the various merits or demerits of particular decisions, but simply l
ooking at the results we are presented with—without questioning the algorithms that have brought them to our attention—is, as scholar Tarleton Gillespie has noted, a little like taking in all the viewpoints at a public protest, while failing to notice the number of speakers who have been stopped at the front gate.32

  Organize the World

  Near the start of this conclusion, I asked whether everything could be subject to algorithmization. The natural corollary to this query is, of course, the question of whether everything should be subject to algorithmization. To use this book’s title, is there anything that should not be made to subserve The Formula? That is the real billion-dollar question—and it is one that is asked too little.

  In this book’s introduction, I quoted Bill Tancer, writing in his 2009 book Click: What We Do Online and Why It Matters, about a formula designed to mathematically pinpoint the most depressing week of the year. As I noted, Tancer’s concern had nothing to do with the fact that such a formula could possibly exist (that there was such a thing as the quantifiably most depressing week of the year) and everything to do with the fact that he felt the wrong formula had been chosen.33

  The idea that there are easy answers to questions like “Who am I?,” or “Who should I fall in love with?,” or “What is art?” is an appealing one in many ways. In Chapter 2, I introduced you to Garth Sundem, the statistician who created a remarkably prescient formula designed to predict the breakup rate of celebrity marriages. When I put the question to him of why such formulas engage the general populace as they do, he gave an answer that smacks more of religiosity than it does of Logical Positivism. “I think people like the idea that there are answers,” he says. “I do silly equations, based on questions like whether you should go talk to a girl in a bar. But the thought that there may actually be an answer to things that don’t seem answerable is extremely attractive.” Does he find it frightening that we might seek to quantify everything, reducing it down to its most granular levels—like the company Hunch I discussed in Chapter 1, which claims to be able to answer any consumer-preference question with 80 to 85 percent accuracy, based only on five data points? “Personally, I think the flip side is scarier,” Sundem counters. “I think uncertainty is a lot more terrifying than the potential for mathematical certainty. While I was first coming up with formulas at college, trying to mathematically determine whether we should go to the library to get some work done, deep down in the recesses of our dorky ids I think that what we were saying is that life is uncertain and we were trying to make it more certain. I’m not as disturbed by numbers providing answers as I am by the potential that there might not be answers.”

 

‹ Prev