by Cathy O'Neil
As with so many WMDs, the problem began at the get-go, when the administrators established the model’s twin objectives. The first was to boost efficiency, letting the machine handle much of the grunt work. It would automatically cull down the two thousand applications to five hundred, at which point humans would take over with a lengthy interviewing process. The second objective was fairness. The computer would remain unswayed by administrators’ moods or prejudices, or by urgent entreaties from lords or cabinet ministers. In this first automatic screening, each applicant would be judged by the same criteria.
And what would those criteria be? That looked like the easy part. St. George’s already had voluminous records of screenings from the previous years. The job was to teach the computerized system how to replicate the same procedures that human beings had been following. As I’m sure you can guess, these inputs were the problem. The computer learned from the humans how to discriminate, and it carried out this work with breathtaking efficiency.
In fairness to the administrators at St. George’s, not all of the discrimination in the training data was overtly racist. A good number of the applications with foreign names, or from foreign addresses, came from people who clearly had not mastered the English language. Instead of considering the possibility that great doctors could learn English, which is obvious today, the tendency was simply to reject them. (After all, the school had to discard three-quarters of the applications, and that seemed like an easy place to start.)
Now, while the human beings at St. George’s had long tossed out applications littered with grammatical mistakes and misspellings, the computer—illiterate itself—could hardly follow suit. But it could correlate the rejected applications of the past with birthplaces and, to a lesser degree, surnames. So people from certain places, like Africa, Pakistan, and immigrant neighborhoods of the United Kingdom, received lower overall scores and were not invited to interviews. An outsized proportion of these people were nonwhite. The human beings had also rejected female applicants, with the all-too-common justification that their careers would likely be interrupted by the duties of motherhood. The machine, naturally, did the same.
In 1988, the British government’s Commission for Racial Equality found the medical school guilty of racial and gender discrimination in its admissions policy. As many as sixty of the two thousand applicants every year, according to the commission, may have been refused an interview purely because of their race, ethnicity, or gender.
The solution for the statisticians at St. George’s—and for those in other industries—would be to build a digital version of a blind audition eliminating proxies such as geography, gender, race, or name to focus only on data relevant to medical education. The key is to analyze the skills each candidate brings to the school, not to judge him or her by comparison with people who seem similar. What’s more, a bit of creative thinking at St. George’s could have addressed the challenges facing women and foreigners. The British Medical Journal report accompanying the commission’s judgment said as much. If language and child care issues posed problems for otherwise solid candidates, the solution was not to reject those candidates but instead to provide them with help—whether English classes or onsite day care—to pull them through.
This is a point I’ll be returning to in future chapters: we’ve seen time and again that mathematical models can sift through data to locate people who are likely to face great challenges, whether from crime, poverty, or education. It’s up to society whether to use that intelligence to reject and punish them—or to reach out to them with the resources they need. We can use the scale and efficiency that make WMDs so pernicious in order to help people. It all depends on the objective we choose.
So far in this chapter, we’ve been looking at models that filter out job candidates. For most companies, those WMDs are designed to cut administrative costs and to reduce the risk of bad hires (or ones that might require more training). The objective of the filters, in short, is to save money.
HR departments, of course, are also eager to save money through the hiring choices they make. One of the biggest expenses for a company is workforce turnover, commonly called churn. Replacing a worker earning $50,000 a year costs a company about $10,000, or 20 percent of that worker’s yearly pay, according to the Center for American Progress. Replacing a high-level employee can cost multiples of that—as much as two years of salary.
Naturally, many hiring models attempt to calculate the likelihood that each job candidate will stick around. Evolv, Inc., now a part of Cornerstone OnDemand, helped Xerox scout out prospects for its calling center, which employs more than forty thousand people. The churn model took into account some of the metrics you might expect, including the average time people stuck around on previous jobs. But they also found some intriguing correlations. People the system classified as “creative types” tended to stay longer at the job, while those who scored high on “inquisitiveness” were more likely to set their questioning minds toward other opportunities.
But the most problematic correlation had to do with geography. Job applicants who lived farther from the job were more likely to churn. This makes sense: long commutes are a pain. But Xerox managers noticed another correlation. Many of the people suffering those long commutes were coming from poor neighborhoods. So Xerox, to its credit, removed that highly correlated churn data from its model. The company sacrificed a bit of efficiency for fairness.
While churn analysis focuses on the candidates most likely to fail, the more strategically vital job for HR departments is to locate future stars, the people whose intelligence, inventiveness, and drive can change the course of an entire enterprise. In the higher echelons of the economy, companies are on the hunt for employees who think creatively and work well in teams. So the modelers’ challenge is to pinpoint, in the vast world of Big Data, the bits of information that correlate with originality and social skills.
Résumés alone certainly don’t cut it. Most of the items listed there—the prestigious university, the awards, even the skills—are crude proxies for high-quality work. While there’s no doubt some correlation between tech prowess and a degree from a top school, it’s far from perfect. Plenty of software talent comes from elsewhere—consider the high school hackers. What’s more, résumés are full of puffery and sometimes even lies. With a quick search through LinkedIn or Facebook, a system can look further afield, identifying some of a candidate’s friends and colleagues. But it’s still hard to turn that data into a prediction that a certain engineer might be a perfect fit for a twelve-member consultancy in Palo Alto or Fort Worth. Finding the person to fill a role like that requires a far broader sweep of data and a more ambitious model.
A pioneer in this field is Gild, a San Francisco–based start-up. Extending far beyond a prospect’s alma mater or résumé, Gild sorts through millions of job sites, analyzing what it calls each person’s “social data.” The company develops profiles of job candidates for its customers, mostly tech companies, keeping them up to date as the candidates add new skills. Gild claims that it can even predict when a star employee is likely to change jobs and can alert its customer companies when it’s the right time to make an offer. But Gild’s model attempts to quantify and also qualify each worker’s “social capital.” How integral is this person to the community of fellow programmers? Do they share and contribute code? Say a Brazilian coder—Pedro, let’s call him—lives in São Paulo and spends every evening from dinner to one in the morning in communion with fellow coders the world over, solving cloud-computing problems or brainstorming gaming algorithms on sites like GitHub or Stack Overflow. The model could attempt to gauge Pedro’s passion (which probably gets a high score) and his level of engagement with others. It would also evaluate the skill and social importance of his contacts. Those with larger followings would count for more. If his principal online contact happened to be Google’s Sergey Brin, or Palmer Luckey, founder of the virtual reality maker Oculus VR, Pedro’s social score would no doubt shoot
through the roof.
But models like Gild’s rarely receive such explicit signals from the data. So they cast a wider net, in search of correlations to workplace stardom wherever they can find them. And with more than six million coders in their database, the company can find all kinds of patterns. Vivienne Ming, Gild’s chief scientist, said in an interview with Atlantic Monthly that Gild had found a bevy of talent frequenting a certain Japanese manga site. If Pedro spends time at that comic-book site, of course, it doesn’t predict superstardom. But it does nudge up his score.
That makes sense for Pedro. But certain workers might be doing something else offline, which even the most sophisticated algorithm couldn’t infer—at least not today. They might be taking care of children, for example, or perhaps attending a book group. The fact that prospects don’t spend six hours discussing manga every evening shouldn’t be counted against them. And if, like most of techdom, that manga site is dominated by males and has a sexual tone, a good number of the women in the industry will probably avoid it.
Despites these issues, Gild is just one player. It doesn’t have the clout of a global giant and is not positioned to set a single industry standard. Compared to some of the horrors we’ve seen—the predatory ads burying families in debt and the personality tests excluding people from opportunities—Gild is tame. Its category of predictive model has more to do with rewarding people than punishing them. No doubt the analysis is uneven: some potential stars are undoubtedly overlooked. But I don’t think the talent miners yet rise to the level of a WMD.
Still, it’s important to note that these hiring and “onboarding” models are ever-evolving. The world of data continues to expand, with each of us producing ever-growing streams of updates about our lives. All of this data will feed our potential employers, giving them insights into us.
Will those insights be tested, or simply used to justify the status quo and reinforce prejudices? When I consider the sloppy and self-serving ways that companies use data, I’m often reminded of phrenology, a pseudoscience that was briefly the rage in the nineteenth century. Phrenologists would run their fingers over the patient’s skull, probing for bumps and indentations. Each one, they thought, was linked to personality traits that existed in twenty-seven regions of the brain. Usually, the conclusion of the phrenologist jibed with the observations he made. If a patient was morbidly anxious or suffering from alcoholism, the skull probe would usually find bumps and dips that correlated with that observation—which, in turn, bolstered faith in the science of phrenology.
Phrenology was a model that relied on pseudoscientific nonsense to make authoritative pronouncements, and for decades it went untested. Big Data can fall into the same trap. Models like the ones that red-lighted Kyle Behm and blackballed foreign medical students at St. George’s can lock people out, even when the “science” inside them is little more than a bundle of untested assumptions.
* * *
* Yes, it’s true that many college-bound students labor for a summer or two in minimum-wage jobs. But if they have a miserable experience there, or are misjudged by an arbitrary WMD, it only reinforces the message that they should apply themselves at school and leave such hellish jobs behind.
Workers at major corporations in America recently came up with a new verb: clopening. That’s when an employee works late one night to close the store or café and then returns a few hours later, before dawn, to open it. Having the same employee closing and opening, or clopening, often makes logistical sense for a company. But it leads to sleep-deprived workers and crazy schedules.
Wildly irregular schedules are becoming increasingly common, and they especially affect low-wage workers at companies like Starbucks, McDonald’s, and Walmart. A lack of notice compounds the problem. Many employees find out only a day or two in advance that they’ll have to work a Wednesday-night shift or handle rush hour on Friday. It throws their lives into chaos and wreaks havoc on child care plans. Meals are catch as catch can, as is sleep.
These irregular schedules are a product of the data economy. In the last chapter, we saw how WMDs sift through job candidates, blackballing some and ignoring many more. We saw how the software often encodes poisonous prejudices, learning from past records just how to be unfair. Here we continue the journey on to the job, where efficiency-focused WMDs treat workers as cogs in a machine. Clopening is just one product of this trend, which is likely to grow as surveillance extends into the workplace, providing more grist for the data economy.
For decades, before companies were swimming in data, scheduling was anything but a science. Imagine a family-owned hardware store whose clerks work from 9 to 5, six days a week. One year, the daughter goes to college. And when she comes back for the summer she sees the business with fresh eyes. She notices that practically no one comes to the store on Tuesday mornings. The clerk web-surfs on her phone, uninterrupted. That’s a revenue drain. Meanwhile, on Saturdays, muttering customers wait in long lines.
These observations provide valuable data, and she helps her parents model the business to it. They start by closing the store on Tuesday mornings, and they hire a part-timer to help with the Saturday crush. These changes add a bit of intelligence to the dumb and inflexible status quo.
With Big Data, that college freshman is replaced by legions of PhDs with powerful computers in tow. Businesses can now analyze customer traffic to calculate exactly how many employees they will need each hour of the day. The goal, of course, is to spend as little money as possible, which means keeping staffing at the bare minimum while making sure that reinforcements are on hand for the busy times.
You might think that these patterns would repeat week after week, and that companies could simply make adjustments to their fixed schedules, just like the owners of our hypothetical hardware store. But new software scheduling programs offer far more sophisticated options. They process new streams of ever-changing data, from the weather to pedestrian patterns. A rainy afternoon, for example, will likely drive people from the park into cafés. So they’ll need more staffing, at least for an hour or two. High school football on Friday night might mean more foot traffic on Main Street, but only before and after the game, not during it. Twitter volume suggests that 26 percent more shoppers will rush out to tomorrow’s Black Friday sales than did last year. Conditions change, hour by hour, and the workforce must be deployed to match the fluctuating demand. Otherwise the company is wasting money.
The money saved, naturally, comes straight from employees’ pockets. Under the inefficient status quo, workers had not only predictable hours but also a certain amount of downtime. You could argue that they benefited from inefficiency: some were able to read on the job, even study. Now, with software choreographing the work, every minute should be busy. And these minutes will come whenever the program demands it, even if it means clopening from Friday to Saturday.
In 2014, the New York Times ran a story about a harried single mother named Jannette Navarro, who was trying to work her way through college as a barista at Starbucks while caring for her four-year-old. The ever-changing schedule, including the occasional clopening, made her life almost impossible and put regular day care beyond reach. She had to put school on hold. The only thing she could schedule was work. And her story was typical. According to US government data, two-thirds of food service workers and more than half of retail workers find out about scheduling changes with notice of a week or less—often just a day or two, which can leave them scrambling to arrange transportation or child care.
Within weeks of the article’s publication, the major corporations it mentioned announced that they would adjust their scheduling practices. Embarrassed by the story, the employers promised to add a single constraint to their model. They would eliminate clopenings and learn to live with slightly less robust optimization. Starbucks, whose brand hinges more than most on fair treatment of workers, went further, saying that the company would adjust the software to reduce the scheduling nightmares for its 130,000 baristas. All work hour
s would be posted at least one week in advance.
A year later, however, Starbucks was failing to meet these targets, or even to eliminate the clopenings, according to a follow-up report in the Times. The trouble was that minimal staffing was baked into the culture. In many companies, managers’ pay is contingent upon the efficiency of their staff as measured by revenue per employee hour. Scheduling software helps them boost these numbers and their own compensation. Even when executives tell managers to loosen up, they often resist. It goes against everything they’ve been taught. What’s more, at Starbucks, if a manager exceeds his or her “labor budget,” a district manager is alerted, said one employee. And that could lead to a write-up. It’s usually easier just to change someone’s schedule, even if it means violating the corporate pledge to provide one week’s notice.
In the end, the business models of publicly traded companies like Starbucks are built to feed the bottom line. That’s reflected in their corporate cultures and their incentives, and, increasingly, in their operational software. (And if that software allows for tweaks, as Starbucks does, the ones that are made are likely to be ones that boost profits.)
Much of the scheduling technology has its roots in a powerful discipline of applied mathematics called “operations research,” or OR. For centuries, mathematicians used the rudiments of OR to help farmers plan crop plantings and help civil engineers map highways to move people and goods efficiently. But the discipline didn’t really take off until World War II, when the US and British military enlisted teams of mathematicians to optimize their use of resources. The Allies kept track of various forms of an “exchange ratio,” which compared Allied resources spent versus enemy resources destroyed. During Operation Starvation, which took place between March and August 1945, the Twenty-first Bomber Command was tasked with destroying Japanese merchant ships in order to prevent food and other goods from arriving safely on Japanese shores. OR teams worked to minimize the number of mine-laying aircraft for each Japanese merchant ship that was sunk. They managed an “exchange ratio” of over 40 to 1—only 15 aircraft were lost in sinking 606 Japanese ships. This was considered highly efficient, and was due, in part, to the work of the OR team.