There’s also something deeply worrisome about Facebook assigning users an identity on the back end, while not allowing those same users to select their own identity in the front end of the system, says Safiya Noble, the information studies scholar we heard from back in Chapter 1. “We are being racially profiled by a platform that doesn’t allow us to even declare our own race and ethnicity,” she told me. “What does that mean to not allow culture and ethnicity to be visible in the platform?” 14
What it means is that Facebook, once again, controls how its users represent themselves online—preventing people from choosing to identify themselves the way they’d like, while enabling advertisers to make assumptions. And because all this is happening via proxy data, it’s obscured from view—so most of us never even realize it’s happening.
PRODUCT KNOWS BEST
In the previous chapter I talked about how the endless barrage of cutesy copy and playful design features creates a false intimacy between us and our digital products—a fake friendship built by tech companies to keep us happily tapping out messages and hitting “like,” no matter what they’re doing behind the curtain. Writer Jesse Barron calls this “cuteness applied in the service of power-concealment”:15 an effort, on the part of tech companies, to make you feel safe and comfortable using their products, while they quietly hold the upper hand.
According to Barron, tech products do this by employing “caretaker speech”—the linguistics term used to describe the way we talk to children. For example, when Seamless, a popular food delivery app, sends cutesy emails about the status of his order, he writes, “I picture a cool babysitter, Skylar, with his jean vest, telling me as he microwaves a pop-tart that ‘deliciousness is in the works,’ his tone just grazing the surface of mockery.” 16
But no matter how cool the babysitter—no matter how far past bedtime Skylar lets us stay up—at the end of the evening we’re still kids under someone else’s control. The result is an environment where we start to accept that the tech products we use, and the companies behind them, know best—and we’re just along for the ride.
In this way, cuteness becomes another cloak for tech companies: a shiny object that deflects criticism. Because the more we focus on the balloons and streamers, the clever videos and animated icons and punny humor, the less we’ll question why companies are asking us for so much information. The less we’ll notice just how much of our identities Facebook is making assumptions about, and who’s using those assumptions for their own purposes. The less we’ll care that almost every app we use is investing thousands of hours into figuring out new ways to squeeze information out of our usage patterns.
All this paternalistic playfulness, in other words, makes us childlike: just a bunch of kids refreshing screens and tapping heart icons, while, as Barron concludes, “adults wait in the woods to take their profits.” 17
DESIGNED TO DISCRIMINATE
Digital products designed to gather as much information about you as they can, even if that data collection does little to improve your experience. Screens built to be tapped through as quickly as possible, so you won’t notice what you’re agreeing to. Services that collect information based on proxy, and use it to make (often incorrect) assumptions about you. “Delightful” features designed to hide what’s actually under the hood. Patronizing language that treats you like a child—and tries to make you believe that tech companies know best. And that’s just scratching the surface. Once you start trying to understand how digital products track your habits and collect your data, it’s hard to stop—because there’s always some new technology emerging, some new product taking surveillance to an even creepier level.
I’m not a privacy expert either. I can’t tell you how to fully detox from data collection; in fact, I think the only way to do it is to stop using digital products altogether. Anything else is just a method for limiting your exposure, for fending off or outsmarting the beast (until some new technology thwarts you again). That doesn’t mean it’s not worthwhile to keep track of your digital footprint and try to limit your data trail. It just means that we also need to think long and hard about how we got here in the first place.
One reason data collection has become so commonplace, and so intense, is that too many of us have spent the past two decades enamored of the brilliance of technology, and blind to how that technology comes to be. Meanwhile, back at the companies behind those technologies, a world of mostly white men has wholeheartedly embraced the idea that they’re truly smarter than the rest of us. That they do know best. That they deserve to make choices on our behalf. That there’s nothing wrong with watching us in “god view,” because they are, in fact, gods. And there’s no one around to tell them otherwise, because anyone who’s different either shuts up or gets pushed out.
These companies take advantage of our obliviousness at every turn, making design and business decisions that affect us all, without ever actually keeping all of us in mind—without understanding what real people are like, considering our needs, or planning for when things go wrong. Why would they? They’re too busy clamoring for endless, unfettered growth to realize they’ve been standing on others’ necks to get it.
This kind of surveillance isn’t good for any of us, but it’s particularly bad for those who are already marginalized. For example, if you’re poor, you’re more likely to live in a higher-crime neighborhood, where many residents have bad credit. You’re also more likely to rely on a mobile device: according to the Pew Research Center, in 2016, one in five Americans who made less than $30,000 a year accessed the internet only from their phone, versus one in twenty of those with incomes over $75,000.18 Mobile usage leads to a trove of location-based data being stored about you, and that location data tells businesses where you live and where you spend time. Once that data is digested, you’re likely to be showered with “predatory ads for subprime loans or for-profit schools.” 19 Plus, it takes a huge amount of time to learn about online tracking and data collection—not to mention the time needed to implement and maintain privacy practices, and the money to invest in additional security programs like password managers.
The only way the technology industry will set reasonable, humane standards for what type of information can be collected and how it can be used is if we stop allowing it to see itself as special—stop allowing it to skirt the law, change the rules, and obfuscate the truth. It’s a hard problem, to be sure: much of the modern internet’s business model is built on buying, selling, parsing, and mining personal data. But pushback is the only way forward. Because once our data is collected—as messy and incorrect as it often is—it gets fed to a whole host of models and algorithms, each of them spitting out results that serve to make marginalized groups even more vulnerable, and tech titans even more powerful.
Chapter 7
Algorithmic Inequity
On the morning of January 20, 2013, Bernard Parker was pulled over in Broward County, Florida. His tags were expired. During the stop, police found an ounce of marijuana in the twenty-three-year-old’s car—enough to charge him with felony drug possession with intent to sell. By 9:30 a.m., he had been booked into jail, where he spent the next twenty-four hours.
One month later, in the predawn hours of February 22, Dylan Fugett was arrested in the same county. He was twenty-four years old. His crime: possession of cocaine, marijuana, and drug paraphernalia. He spent the rest of the night in jail too.
Parker had a prior record. In 2011 he had been charged with resisting arrest without violence—a first-degree misdemeanor in Florida. Police say he ran from them, along the way throwing away a baggie that they suspected contained cocaine. Fugett had a record too: in 2010 he had been charged with felony attempted burglary.1
You might think these men have similar criminal profiles: they’re from the same place, born less than a year apart, charged with similar crimes. But according to software called Correctional Offender Management Profiling for Alternative Sanctions, or COMPAS, these men aren’t the same a
t all. COMPAS rated Parker a 10, the highest risk there is for recidivism. It rated Fugett only a 3.
Fugett has since been arrested three more times: twice in 2013, for possessing marijuana and drug paraphernalia, and once in 2015, during a traffic stop, when he was arrested on a bench warrant and admitted he was hiding eight baggies of marijuana in his boxers. Parker hasn’t been arrested again at all.
Parker is black. Fugett is white. And according to a 2016 investigation by ProPublica, their results are typical: only about six out of ten defendants who COMPAS predicts will commit a future crime actually go on to do so. That figure is roughly the same across races. But the way the software is wrong is telling: Black defendants are twice as likely to be falsely flagged as high-risk. And white defendants are nearly as likely to be falsely flagged as low-risk. ProPublica concluded that COMPAS, which is used in hundreds of state and local jurisdictions across the United States, is biased against black people.2
It’s also secret.
COMPAS is made by a private company called Northpointe, and Northpointe sees the algorithms behind the software as proprietary—secret recipes it doesn’t want competitors to steal. That’s pretty typical. Algorithms now control a huge number of systems that we interact with every day—from which posts bubble to the top of your Facebook feed, to whether image recognition software can correctly identify a person, to what kinds of job ads you see online. And most of those algorithms, like COMPAS, are considered proprietary, so we can’t see how they’ve been designed.
COMPAS might be a particularly problematic example—it can directly affect how long a convicted person spends in jail, after all. But it’s far from alone. Because, no matter how much tech companies talk about algorithms like they’re nothing but advanced math, they always reflect the values of their creators: the programmers and product teams working in tech. And as we’ve seen time and again, the values that tech culture holds aren’t neutral. After all, the same biases that lead teams to launch a product that assumes all its users are straight, or a sign-up form that assumes people aren’t multiracial, are what lead them to launch machine-learning products that are just as exclusive and alienating—and, even worse, locked in a black box, where they’re all but invisible.
WHAT IS AN ALGORITHM?
If you’re not exactly sure what “algorithm” means, don’t despair. Many of us can use the word in a sentence but can’t describe how one works, or precisely why so many of today’s digital services rely on algorithms. But don’t be fooled: while they can be used to solve complex problems, algorithms themselves can be extremely simple. An algorithm is just the specific set of steps needed to perform some type of computation—any type of computation. For example, let’s say you want to add two numbers together:
57
+ 34
You could do what’s called the partial-sum method: First you add the numbers in the tens column together, 50 plus 30, for a total of 80. Then you add the numbers in the ones column together, 7 and 4, for a total of 11. Finally, you add those partial sums together, 80 plus 11, for a total of 91. This happens to be more or less how I add sums in my head, though the steps are so ingrained I hardly know I’m doing them. But that’s not the only method for solving this problem.
You could also use the column method: first you add the numbers in the ones column, 7 plus 4, for a total of 11. You put the 11 in the ones column of your answer. Then you add the tens column, 5 plus 3, for a total of 8. You put that in the tens column of your answer. But each column can contain only one digit, so you trade ten of your 1’s for one 10. This gives you 9 in the tens column, and leaves you with 1 in the ones column. Once again, your total is 91. These are both examples of algorithms: the steps you take to figure something out.
We do algorithms like these all day long. It’s just that we don’t necessarily think about our day-to-day tasks that way, because we do them automatically. In fact, if you grew up with the kind of public-school education I got in the 1980s and 1990s, you probably didn’t learn to add using either of these methods. In my school, we stood up in front of the chalkboard and added up the ones column, and when it was more than 10, we “carried the 1” over to the next column, marking a tiny 1 on top of the tens column as we went. That approach is fundamentally the same as the columns method, except that it speeds through some of the steps, rather than making each one explicit.
Maybe that’s why we tend to glaze over when people talk about algorithms. They sound complex, because most of us are not used to breaking down discrete steps in such painstaking detail. But that’s what the algorithms behind tech products are doing: they run through a long list of incredibly tedious steps in order to perform a computation. The difference is just that they’re doing it at a scale that we humans can’t compete with; after all, if you got bored with my breaking down all the tasks required to do column addition, you would never sit down with a pencil and paper and sort out equations with hundreds or thousands of factors. That’s what it would take to determine, say, which site comes up first in a Google search, or which of the nearby restaurants bubbles to the top of Yelp’s review list when you search on the word “dinner.”
That’s why computers are so great: data sets that would take individual humans whole lifetimes to make sense of can be sorted through in an instant. But that power isn’t without its problems. Because algorithms crunch through so much information at once, it’s easy never to think about how they do it, or ask whether the answer they spit out at the end is actually accurate. A computer can’t get it wrong—right?
Sadly, they can. Computers are extremely good at performing tasks, true. But if a computer is fed a faulty algorithm—given the wrong set of tasks to perform—it won’t know that it didn’t end up at the right conclusion, unless it gets feedback letting it know it was wrong. Imagine a math teacher accidentally leaving a step out of an addition lesson. The student then has a faulty algorithm—one that results in a wrong answer every time, no matter how perfectly the student performs the steps. That’s what happens when an algorithm is biased: the computer isn’t failing. The model is.
The algorithm that Yelp uses when you search for “dinner,” for example, is designed to go through all of the listings in Yelp’s database and pick out the ones that are best suited to your search. But Yelp can’t actually know which restaurant is the best option for you at a given moment, so instead its algorithm uses what it does know, both about you and about the restaurants in its database, to make an educated guess. It does this by taking all the factors that Yelp’s product team decided were relevant—restaurants you’ve reviewed in the past, restaurants you’ve viewed in the past, how close a restaurant is to you, whether other people tended to say it’s good for dinner, whether they used the term “dinner” in their reviews, the total number of reviews the restaurant has, its star rating, and many, many more items—and running them through a proprietary system of weighting and ranking those variables. The result is a prioritized list that helps you, the user, choose a restaurant.
A perfect algorithm would be one in which every single user query produced precisely the ten restaurants that are most likely to make that user happy. That’s probably not possible, of course; humans are too diverse, with too many different whims, to satisfy all people, all the time. But it’s the ideal. Meanwhile, a faulty algorithm would turn up inappropriate or bad options for users—like consistently recommending businesses that don’t serve food, or that other users said were terrible. Mostly, the algorithm is somewhere in the middle: it finds just what you want a lot of the time, but sends you somewhere mediocre some of the time too. Yelp is also able to tune its model and improve results over time, by looking at things like how often users search, don’t like the results, and then search again. The algorithm is the core of Yelp’s product—it’s what connects users to businesses—so you can bet that data scientists are tweaking and refining this model all the time.
A product like COMPAS, the criminal recidivism software, doesn’t just affe
ct whether you opt for tacos or try a new ramen place tonight, though. It affects people’s lives: whether they can get bail, how long they will spend in prison, whether they’ll be eligible for parole. But just like Yelp, COMPAS can’t know whether an individual is going to commit a future crime. All it has is data that it believes indicates they are more or less likely to.
Northpointe calls those indicators “risk and needs factors,” and it uses 137 of them in the COMPAS algorithm. Many of those questions focus on the defendant’s criminal history: how many times they’ve been arrested, how many convictions they’ve had, how long they’ve spent in jail in the past. But dozens of the questions that COMPAS asks aren’t related to crime at all—questions like, “How many times did you move in the last twelve months?” or “How often did you feel bored?” Those 137 answers are fed into the system, and—like magic—a COMPAS risk score ranging from 1 to 10 comes out the other end.
Except, it’s not magic. It’s a system that’s rife with problems. For one, many of these questions focus on whether people in your family or social circle have ever been arrested. According to Northpointe, these factors correlate to a person’s risk level. But in the United States, black people are incarcerated at six times the rate of white people—often because of historical biases in policing, from racial profiling to the dramatically more severe penalties for possession of crack compared with possession of cocaine (the same drug) throughout the 1980s and 1990s.3 So if you’re black—no matter how lawfully you act and how careful you are—you’re simply a lot more likely to know people who’ve been arrested. They’re your neighbor, your classmate, your dad.
Technically Wrong Page 10