Technically Wrong
Page 12
It can actually get worse.
In 2013, Google researchers trained a system to comb through Google News articles, parsing huge amounts of text and identifying patterns in how words are used within them. The result is Word2vec, a neural network made up of 3 million word embeddings, or semantic relationships between words. What Word2vec does is essentially reconstruct the way words work linguistically, in order to improve capabilities for natural language processing: the practice of teaching machines to understand human language as it’s spoken or written day to day—the kind of thing that allows Siri or a search engine to understand what you mean and provide an answer. Word2vec and other similar word-embedding systems do this by looking at how frequently pairs of words appear in the same text, and how near each other they appear. Over time, these patterns allow a system to understand semantic meaning and accurately complete analogies like “man is to woman as king is to _____” or “Paris is to France as Tokyo is to _____.” 26
That’s all well and good, but the system also returns other kinds of relationships—like “man is to woman as computer programmer is to homemaker.” This is because word embeddings show a strong vector between “man” and “computer programmer,” which is similar to the vector between “woman” and “homemaker.” The same is true for “father is to doctor as mother is to nurse,” “man is to architect as woman is to interior designer,” and even “man is to coward as woman is to whore.” When you think about it, these pairings aren’t very surprising: There are more women who are nurses than doctors. Computer programmers do still tend to be male. So it makes sense that these cultural realities are also depicted in news articles; for example, journalists routinely use job titles to describe a person being quoted.
In other words, if a system like Word2vec is fed data that reflects historical biases, then those biases will be reflected in the resulting word embeddings. The problem is that very few people have been talking about this—and meanwhile, because Google released Word2vec as an open-source technology, all kinds of companies are using it as the foundation for other products. These products include recommendation engines (the tools behind all those “you might also like . . .” features on websites), document classification, and search engines—all without considering the implications of relying on data that reflects historical biases and outdated norms to make future predictions.
One of the most worrisome developments is this: using word embeddings to automatically review résumés. That’s what a company called Talla, which makes artificial-intelligence software, reported it was doing in 2016. Called CV2vec, this software can “find candidates that are most similar to a reference person or the job ad itself, cluster people together and visualize how CVs align with each other, and even make a prediction as to what someone’s next job will be” 27—all without a human ever looking at their résumé. According to Talla CEO Rob May, the results could make it easier for companies to identify candidates that most match up with current top-performing staff: “What if you could say ‘I want someone like my best engineer, but with more experience in management?’” 28
Well, if that query were based on word embeddings derived from historical texts—such as the résumés of engineers already on staff—you just might end up discarding applications from women engineers, because your training data connected the term “engineer” more closely to male attributes than to female ones. For example, the system might notice that the current engineers’ names reflected certain patterns, or that those engineers tended to be part of a specific fraternity, and therefore decide that candidates who also have traditionally male names or were in a fraternity are a better match for the position. And because word embeddings are part of a complex network, the words “man” and “woman” don’t need to be present anywhere for this to happen.
That’s what concerns researchers from Boston University and Microsoft Research about artificial intelligence based on word embeddings. In a paper titled “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings,” they argue that because word embeddings frequently underpin a range of other machine-learning systems, they “not only reflect such stereotypes but can also amplify them” 29—effectively bringing the bias of the original data set to new products and new data sets. So much for machines being neutral.
This doesn’t mean we need to throw out technology like Word2vec. It just means that tech companies have to work harder. Either they need to get a lot better at feeding these systems unbiased text or, more feasibly, they need to make it part of their job to scrub the bias from word embeddings before using them. The latter is what the researchers from Boston University and Microsoft propose. They demonstrate a method for algorithmically debiasing word embeddings, ensuring that gender-neutral words, like “nurse,” are not embedded closer to women than to men—without breaking the appropriate gender connection between words like “man” and “father.” They also argue that the same could be done with other types of stereotypes, such as racial bias.
Once again, the problem isn’t with the technology. It’s with the assumptions that technologists so often make: that the data they have is neutral, and that anything at the edges can be written off. And once those assumptions are made, they wrap them up in a pretty, polished software package, making it even harder for everyone else to understand what’s actually happening under the surface.
PAY NO ATTENTION TO THE MATH BEHIND THE CURTAIN
In 2012, a seventeen-year-old in South Carolina named Dylann Roof kept hearing about a black teenager who had been shot to death by a security guard in Florida. He googled the name: Trayvon Martin. Roof then read Martin’s Wikipedia page, where a phrase caught his eye: “black on white crime.” He googled that phrase, and soon he had found his way to the website of the Council of Conservative Citizens, the largest white-supremacist group in the United States.30
“I have never been the same since that day,” Roof wrote in an online manifesto in 2015. “There were pages upon pages of these brutal black on White murders. I was in disbelief.” 31 He dug further, his searches leading him deeper and deeper into the world of online hate groups and propaganda. He discovered the “Jewish problem.” And then he decided to do something with his newfound “racial awareness.” At 4:44 on the afternoon of June 17, 2015, he finished the manifesto.
That evening, he entered Charleston’s historic Emanuel African Methodist Episcopal Church, and sat with members of a Bible study group for an hour. Then he stood up, opened fire, and killed nine black people.
Roof’s case is extreme, tragic, and terrifying. But it highlights just how much influence technology—and the way technology is designed—has over what people believe is true. That’s not by accident, writes sociologist Miriam E. Sweeney. She claims it’s an explicit part of Google’s design, which “frames the search process as informational, unbiased, and scientific.” 32 She writes:
The simple, sparse design works to obscure the complexity of the interface, making the result appear purely scientific and data-driven. This is an image Google actively cultivates. The company has explicitly claimed its results to be neutral, standing behind the authority of its “objective” algorithmic ranking system. . . . The insistence of the scientific “truth” of algorithmic search has encouraged users to view search as an objective and neutral experience.
By and large, people do trust Google: back in 2012, when Roof began his search for race-related information, the Pew Research Center reported that nearly three in four search engine users in the United States said that “most or all the information they find as they use search engines is accurate and trustworthy,” and two out of three believed “search engines are a fair and unbiased source of information.” 33 According to a 2017 report by the communications marketing firm Edelman, that’s still largely the case: in a survey of 33,000 people conducted worldwide during the fall of 2016, 64 percent said they trusted search engines for news and information—seven points more than the 57 percent of respondents who said the
y trusted traditional media.34
This isn’t by accident. In his book In The Plex: How Google Thinks, Works, and Shapes Our Lives, author Stephen Levy describes how Marissa Mayer once rejected a design at Google: “It looks like a human was involved in choosing what went where. It looks too editorialized,” she told the team of designers. “Google products are machine-driven. They’re created by machines. And that is what makes us powerful. That’s what makes our products great.” 35
It’s not just search that relies on a clean, simple aesthetic. We can see this same design technique across the platforms and products that rely on algorithms—even in an institutional, niche piece of software like COMPAS. Northpointe bills COMPAS as “simple to use and interpret,” saying that it’s “designed to be user-friendly, even for those with limited computer experience and education.” 36 And it shows in the interface itself, which transforms a person’s specific story into a series of color-coded bar charts.
In consumer software, the design aesthetic is even stronger: just try not to be mesmerized while watching your sleek little car arrive on Uber, or seeing your photos instantly sort into tidy little categories in Google Photos. Again, Uber stops being a company with thousands upon thousands of contingent workers who sleep in their cars and work sixteen-hour days to make ends meet, and starts being nothing but a slick app that magically transports you from A to B. Google Photos stops being a complex, algorithmically driven system, and starts feeling like an objective truth.
Now, I spend most days running a consulting firm that helps companies simplify their content, strip interfaces of extra steps, and generally produce technology that makes more sense for the people who have to use it. I believe that making interfaces easier to use is vital work. But when designers use clean aesthetics to cover over a complex reality—to take something human, nuanced, and rife with potential for bias, and flatten it behind a seamless interface—they’re not really making it easier for you. They’re just hiding the flaws in their model, and hoping you won’t ask too many difficult questions.
ARTIFICIAL INTELLIGENCE, REAL IMPACT
This stuff might sound advanced, but odds are good that it’s already affecting your everyday life. Whenever you go online, you’re almost certain to encounter algorithmically generated results. They decide what you’ll be prompted to read next when you reach the bottom of an article. They tell you which products “people like you” tend to purchase. They control your Google results and your Netflix recommendations. They’re why you see only a portion of your friends’ posts on your Facebook News Feed.
Algorithmic models are also used behind the scenes in all kinds of industries: to evaluate teachers’ performance, to find fraudulent activity on an account, to determine how much your insurance should cost, to decide whether you should be approved for a loan. The list could go on and on.
In all of these places, algorithms are making choices that affect your life, from whether you can find or keep a job to how much you pay for a product to what information you can access. And every single one of them is subject to the kinds of biases mentioned here. Add in the fact that all of these algorithms rely on personal data—your data—and the disconnect between just how much power these systems have, compared with how little the general public knows about them, is downright scary.
Even more worrisome, most of the people who create these products aren’t considering the harm that their work could do to people who aren’t like them. It’s not because they’re consciously biased, though, according to University of Utah computer science professor Suresh Venkatasubramanian. They’re just not thinking about it—because it has never occurred to them that it’s something to think about. “No one really spends a lot of time thinking about privilege and status,” he told Motherboard. “If you are the defaults you just assume you just are.” 37
According to Sorelle Friedler, one of the biggest concerns is the training data—the data used to build these models in the first place. Training data includes things like the images that Google Photos was fed before it launched, or the text corpus that Word2vec consumed to build its embeddings, or the historical crime data that COMPAS crunched to determine its model. “There’s not enough focus on what specific data is being used to create the tools,” Friedler told me. “If someone is creating a recidivism prediction algorithm that uses rearrest data as the outcome variable, then what they are actually doing is creating a prediction tool to determine who will be rearrested—which potentially bears little relationship to who actually recommitted a crime.” 38
To demonstrate, she related a story she had heard from a defense lawyer, who worries about algorithms because they’re often trained using data about parole violations as a primary factor; that is, if you violate parole, you’re much more likely to get a high-risk score. The lawyer had a client who was told he couldn’t own guns while on parole. The man went home, got his guns, and took them to a pawnshop. Police were staking out the shop. They arrested him and called the incident a violation—even though he was complying with the conditions of his parole. “As soon as you start training the algorithm to that type of thing, then you have to ask, ‘Who is more likely to be caught?’” said Friedler. “People need to understand that data is not truth. It is not going to magically solve these hard societal problems for us.” 39
At the same time, Friedler cautions that the answer isn’t to throw up our hands and say, “We tried those data-driven things, and it turns out that they’re bad, so we’ll go back to not thinking about data.” Data matters.
Or at least, good data does. If we want to build a society that’s fairer, more just, and more inclusive than in the past, then blindly accepting past data as neutral—as an accurate, or desirable, model upon which to build the future—won’t cut it. We need to demand instead that the tech industry take responsibility for the data it collects. We need it to be transparent about where that data comes from, which assumptions might be encoded in it, and whether it represents users equally. Otherwise, we’ll only encounter more examples of products built on biased machine learning in the future. And as we’re about to see, when problematic algorithms mix with weaknesses built into digital platforms themselves, their ramifications can be profound.
Chapter 8
Built to Break
Lindy West is used to online hate. A comedian and author best known for writing about feminism and being fat, she’s a pro at ignoring taunts about her weight, her career, and her politics. But when a picture of a Thomas the Tank Engine character with the words “CHOO CHOO MOTHERFUCKER THE RAPE TRAIN’S ON ITS WAY. NEXT STOP YOU” pasted on top showed up in her Twitter mentions in 2014, she felt creeped out and menaced. So, West took the only action available to her: she reported it as abuse.
At the time, Twitter’s terms of service specifically stated that “users may not make direct, specific threats of violence against others”—yet West was told her experience didn’t qualify. “We’ve investigated the account and the Tweets reported as abusive behavior, and have found that it’s currently not violating the Twitter Rules,” said the response from Twitter’s support team.1
It wasn’t just West. As soon as she posted a screenshot of the tweet and Twitter’s response, she heard from countless other women who’d had their reports of abusive tweets rejected as well. The tweets deemed OK by Twitter? They included everything from rape and death threats to suggestions of suicide to a troll telling a woman, “I would love to knock you the fuck out.”
By early 2015, West’s article had made the rounds at Twitter, and even then-CEO Dick Costolo took notice. “We suck at dealing with abuse and trolls on the platform and we’ve sucked at it for years,” he wrote in a memo to staff. “We lose core user after core user by not addressing simple trolling issues that they face every day.” 2
Costolo was right: in a 2014 Pew Research Center study, 13 percent of people who’d been harassed online said they had deleted a profile or changed their username because of harassment, and 10 percent said they
had left an online forum because of it.3
After two more years of sustained harassment, that’s precisely what West did. In January 2017—as then-president-elect Donald Trump was taunting South Korea and strangers were harassing her for her views on the death of Carrie Fisher—West realized she was done: She was tired of neo-Nazis digging into her personal life. She was tired of men telling her they’d like to rape her, “if [she] weren’t so fat.” More than anything, she was tired of feeling like all Twitter’s talk about taking harassment seriously hadn’t gotten her anywhere:
I talk back and I am “feeding the trolls.” I say nothing and the harassment escalates. I report threats and I am a “censor.” I use mass-blocking tools to curb abuse and I am abused further for blocking “unfairly.” 4
Twitter had helped West build a national audience for her writing. It had helped her turn her memoir, Shrill, into a best-selling book. In many ways, it had been the most visible part of her professional profile: she had almost 100,000 followers. But it just wasn’t worth it anymore. She deactivated her account.
The first piece of advice anyone gets about online harassment is the same line that West finally got sick of: Don’t feed the trolls. Don’t read the comments. Online spaces are simply filled with shitheads and pot stirrers, the story goes, so there’s no point in trying to do anything about it. Just ignore them and they’ll move on. This is untrue: harassment campaigns can last months, even when victims do ignore the perpetrators. But this advice is also a subtle form of misdirection: by focusing attention on what the victim does or doesn’t do, it diverts attention away fromwhy the abuse happens in the first place—and how digital platforms themselves enable that abuse.
It’s not just harassment either. The digital platforms we rely on to connect with friends, stay informed, and build our careers are routinely being manipulated in ways that harm us—from the abuse that women like West routinely receive, to Facebook’s Trending algorithm being inundated with fake news during the 2016 election, to the way Reddit’s system of subreddits puts the burden of oversight on unpaid moderators and makes it impossible to keep harassing and creepy content out.