Dataclysm: Who We Are (When We Think No One's Looking)

Page 7

by Christian Rudder

From these intrepid few, the app bequeathed the world a rare data set. Crazy Blind Date recorded not only the fact that dater A and dater B met in person but also their opinions of each other. After each completed date, like a nosy roommate, the app asked how it went. Because most of the users also had OkCupid accounts, we were able to cross-reference this data with all kinds of demographic details. We suddenly had in-person records to combine with our massive collection of digital interactions. When you merge the two sources you find something remarkable: the two people’s looks had almost no effect on whether they had a good time. No matter which person was better-looking or by how much—even in cases where one blind-dater was a knockout and the other rather homely—the percent of people giving the dates a positive rating was constant. Attractiveness didn’t matter. This data, from real dates, turned everything I’d seen in ten years of running a dating site on its head.

Here are the numbers for men. I’ve expressed attractiveness below as the relative difference in a couple’s individual ratings, rather than as absolutes. I did this to capture the fact that a person’s happiness at finding himself across the table from, say, a “6” is highly dependent on his own looks. If he’s a “1,” he might be thrilled with that arrangement—it means he’s dating up. A “10” would feel differently. I’ve included the counts of dates as the bars to show that the balance in attractiveness between the men and women going on the dates was about what you’d expect if they were randomly paired. There was no evidence of people gaming the system by, say, somehow unscrambling the pictures beforehand or showing up to the date venue and then leaving on the sly when their blind date arrived and didn’t pass muster. The satisfaction numbers (for males) are the percentages in red:

And following is the same data for women:

Through both Crazy Blind Date data sets, people just didn’t seem to care that much about the other person’s physical appearance. Women had a good time 75 percent of the time, men 85 percent. The rest of the variation is basically noise. That indifference to looks is just about the opposite of what you see in the OkCupid data. For example, I’ve plotted the in-person satisfaction data above (the numbers in red) alongside those same women’s reply rates to messages online. To make it easier to compare them, the lines show change against the average of their respective quantities:

The male comparison chart is very similar to this one, and, to be clear, the data underpinning the two lines above is from the same set of people. The black line is their OkCupid experience, the red from Crazy Blind Date. In short, people appear to be heavily preselecting online for something that, once they sit down in person, doesn’t seem important to them.

That kind of superficial preselection is everywhere. In fact, there’s a lot of money to be made off it. You know what the difference between Tylenol and Kroger’s store-brand acetaminophen is? The box. Unless you take medicine like a king snake and plan to just swallow the package whole, there’s really no reason to pay twice as much for the “name” molecules, whose properties are determined by immutable chemical law. And yet, I have a big red Tylenol bottle on my dresser.

We of course pay the most attention to labels when they’re attached to people. In terms of superficial compatibility, self-described Democrats and Republicans get along the least of all major groups on OkCupid—worse even than Protestants and Atheists. I know this through the many match questions the site asks: they cover pretty much everything, and the average user answers about three hundred of them. The site lets you decide the importance of each question you answer, and you can pinpoint the answers that you would (and would not) accept from a potential match. Despite all this control, in the political case, the system breaks down. When you look beyond the labels, at who actually messages whom, and who replies (and therefore who ends up going on actual dates), it’s caring about politics, one way or the other, that is actually more important to mutual compatibility than the details of any particular belief. We confirmed this in a summer-long experiment in 2011.

People tend to run wild with those match questions, marking all kinds of stuff as “mandatory,” in essence putting a checklist to the world: I’m looking for a dog-loving, agnostic, nonsmoking liberal who’s never had kids—and who’s good in bed, of course. But very humble questions like Do you like scary movies? and Have you ever traveled alone to another country? have amazing predictive power. If you’re ever stumped on what to ask someone on a first date, try those. In about three-quarters of the long-term couples OkCupid has ever brought together, both people have answered them the same way, either both “yes” or both “no.” People tend to overemphasize the big, splashy things: faith, politics, and certainly looks, but they don’t matter nearly as much as everyone thinks. Sometimes they don’t matter at all.

Fiasco though it was, Love Is Blind Day gave us a visceral example of what people do in the absence of information. In hiding pictures but changing nothing else, we created a real-time experiment to set against the site’s usual activity. For seven hours our users acted without the very thing our previous data had indicated was the single most important piece of knowledge OkCupid could offer: what everyone else looked like.

Some of the upshot was predictable. People sent messages without the typical biases, or racial and attractiveness skews. What a user couldn’t see, he couldn’t judge. But of the 30,333 messages sent blindly, eventually 8,912 got replies, a rate about 40 percent higher than usual. And in the dark, for those who were there, something astounding happened. Twenty-four percent of the pairs of people talking when the photos were hidden had exchanged contact info before pictures were turned back on. That was in only the seven-hour window of Love Is Blind Day. The expected number in that amount of time is barely half that. So not only were people writing messages that were far more likely to get replies, they were giving out phone numbers and e-mail addresses at a higher rate—to people they’d never even seen.

For the couples who began talking and were still getting to know each other when we restored photos at four p.m., however, the day had a reverse effect. The two people had been in the dark, then suddenly the lights came on, and, in the data, you can actually see them spook. Threads straddling the moment we flipped the switch lasted an average of 4.4 more messages. When you compare them against a control data set, they should’ve lasted 5.6. Eventual contact-info exchanges in those “lights on” threads were down by a similar amount.

Dating sites are designed to give people the tools and the information to get whatever they want out of being single—casual sex, a few fun dates, a partner, a marriage … anything. Stuff like height, political views, photos, essays, all of it is right there, easily sortable, easily searchable. It’s there to help people make judgments and fulfill their desires, and as fascinating as those judgments and desires may be to pick apart, there’s a side of it that I think does love a disservice. People make choices from the information we provide because they can, not because they necessarily should.

I can’t help think of the many people getting turned down because of some perceived “deal-breaker” that actually no one cares about and wonder if the Internet has changed romance in the way it’s changed so much else—and for the same reason. If I may channel my inner anti-Jagger: Online, you can always get what you want. But what you need, that’s a much harder thing to find.

6. The Confounding Factor

7. The Beauty Myth in Apotheosis

8. It’s What’s Inside That Counts

9. Days of Rage

6.

The Confounding Factor

If you stand on the southwest corner of Fifty-Eighth and Fifth with a clipboard and do a little people-watching, you can very quickly conclude that most New Yorkers are beautiful, thin, and above all, rich. Every thread, every grommet, every crease shines with money. Of course, many New Yorkers are rich, but that’s not the whole story here. You’re standing outside Bergdorf Goodman, and that’s a confounding factor.

This is a technical term for so
mething you haven’t accounted for in your analysis but that nonetheless affects its results. Making sure you’re not perched in some bitwise version of the Upper East Side is one of the most time- and thought-consuming parts of working with digital data. When you have seemingly every variable and every possibility available for analysis and speculation, your research is free to travel wherever your curiosity leads. But true to the cliché, that freedom requires eternal vigilance.

And here’s where I have an admission to make. So far in these pages, wherever you’ve seen the data of a person-to-person opinion, in the votes, in the date results from Crazy Blind Date, the charts, the tables—in every ratio, in every total—whenever one user was judging another, both people involved were white. I had to make it that way, because when you’re looking at how two American strangers behave in a romantic context, race is the ultimate confounding factor. And to make sure whatever I wanted to say about attraction or sex spoke to those ideas alone, I needed to cut it from the discussion.

As an American, the reflex to sweep race under the rug is inborn, so in a way, though the numbers forced my hand, I was just doing what came naturally. And even apart from our nation’s peculiar relationship with the topic, a long history of tokenism and sorry pseudoscience makes any quantitative analysis of race especially fraught. That’s not to say we don’t have good numbers. There are plenty of them, of a certain type—if my preferred data is person-to-person, then I think of this other as person-to-thing: one group or another versus unemployment rates, the SAT, the criminal justice system, cancer … As much as research like this has helped us pinpoint and (occasionally) address inequality, there’s something incomplete about it. You lose the human who is doing (or not doing) the hiring, the teaching, the police work, the preventative care; you lose the people who created the outcomes that all these studies purport to measure. So what you end up with is conclusions like this: Black Defendants Are at Least 30 Percent More Likely to Be Imprisoned Than White Defendants for the Same Crime. The headline’s passive voice says it all. Who’s miscarrying the justice here? Syntactically, no one. Practically, I have a good guess. But it is a rare study indeed that looks beyond the institutions, to the fundamental “us versus them” binary of race relations.

Behind every bit in my data, there are two people, the actor and the acted upon, and the fact that we can see each as equals in the process is new. If there is a “-clysm” part of the whole data thing, if this book’s title isn’t more than just a semi-clever pun or accident of the alphabet—then this is it. It allows us to see the full human experience at once, not just whatever side we happen to be paying attention to at a given time.

Before the advent of data like ours, one of the most quantified arenas in public life was sports. There you have real-time numbers on every conceivable interaction, and you have the data on an individual level, to be sliced and recombined at will. Perhaps it’s surprising, then, that sports is where the discussion of race is least analytic. The “black quarterback” controversy that stretched for the first ten years or so of this millennium is the perfect example. For years there was a regular news cycle: an African American quarterback would go early in the draft or start a high-profile game, and someone would inevitably imply that blacks can’t succeed at the position in the NFL. The usual reason given was that they lacked the intelligence. There would be backlash, discussion, and plenty of argument that this was nothing more than mean-spirited stereotyping. But amidst all the commentary and outcry, and outcry against the outcry, in the 97,000 results that Google returns for “black quarterback,” I found only one article that actually calculates the quarterback ratings of blacks and whites, which turn out to be the same down to the second decimal: 81.55. In a genre so stats-obsessed, where platoons of number crunchers calculate Johnny Placekicker’s 54 percent success rate on field goal attempts over 50 yards in road games decided by 7 points or less against AFC opponents, you’d think that statistically comparing black and white quarterbacks would’ve been everyone’s first instinct. Instead, there was, and generally is around race, an eerie numerical silence. You find in its place rhetoric and appeals to anecdote. But a “debate” done in this style just leaves everyone believing they’re right, when, in fact, for all the words expended, a single number—81.55—can clearly show that one side is wrong. The article that did the rating calculation had 0 tweets and 0 Facebook likes, by the way, and it wasn’t posted on some obscure blog; it appeared on The Big Lead, which is owned by USA Today. You often get the feeling that people just don’t want to know.

Where in situations like this we might seem to lack the will to examine race through a statistical lens, in many other arenas we have simply lacked the data. Most aspects of life haven’t been as obsessively quantified as football. That is changing rapidly.

On OkCupid, one of the easiest ways to compare a black person and a white person (or any two people of any race) is to look at their “match percentage.” That’s the site’s term for compatibility. It asks users a bunch of questions, they give answers, and an algorithm predicts how well any two of them would get along over, say, a beer or dinner. Unlike other features on OkCupid, there is no visual component to match percentage. The number between two people only reflects what you might call their inner selves—everything about what they believe, need, and want, even what they think is funny, but nothing about what they look like. Judging by just this compatibility measure, the four largest racial groups on OkCupid—Asian, black, Latino, and white—all get along about the same.1 In fact, race has less effect on match percentage than religion, politics, or education. Among the details that users believe are important, the closest comparison to race is Zodiac sign, which has no effect at all. To a computer not acculturated to the categories, “Asian” and “black” and “white” could just as easily be “Aries” and “Virgo” and “Capricorn.”

But this racial neutrality is only in theory; things change once the users’ own opinions, and not just the color-blind workings of an algorithm, come into play. Given the full profile, with the photo dominating the page, this is how OkCupid’s users rate each other by race:

I’ve given the raw data above, unadorned, because by now you’re at least a little familiar with OkCupid’s 1- to 5-star system. But to make the trends easier to see, I’m going to take that same matrix and “normalize” each row. In the table below, each entry is the percentage difference (+/-) from the average (the “normal”) in the row. It’s the same information, just phrased a bit differently. Think of the normalized number as the men’s relative preference for women. For example, as you can see, Asian men think Asian women are 18 percent better-looking than the average, while black men think they’re just 2 percent better. And so on:

I’ll soon move beyond OkCupid, and when I present similar matrices later, I’ll go directly to the normalized scores. But for now, the two essential patterns of male-to-female attraction are plain: men tend to like women of their own race. Far more than that, though, they don’t like black women. Message data is highly correlated with these ratings, so they follow the pattern as well.2

Just to show that these voting trends aren’t being thrown off by some obscure statistical artifact, I’ve put the raw per capita vote numbers in what’s called a box plot—it tells you where the bulk of a data set lies. You see below that the central mass of black women is rated almost entirely below the other three ethnicities’, and the black women’s upper extreme is about at the midline of the other three:

Mathematically, this is a complete discount—being black basically costs you about three-quarters of a star in your rating, even if you’re at the top. Further, when you do this analysis in reverse, and look at the people actually casting the votes, you see a similar wholesale pattern. The broad majority of non-black men apply that three-quarters reduction to black women. There is no cadre of racists single-handedly bringing everything down.

However startling this may be, it only reflects one data set, the thoughts of one grou
p of people. So here’s a good place to pause for a second and answer a question you might have been asking earlier, given how much I’ve relied on OkCupid’s data so far in this book: Who are these people?

In the most superficial way, OkCupid’s members reflect the general composition of Internet users, with of course the caveat that (almost) everyone on the site is single. The site’s users are younger than the national average (OkCupid’s median age is twenty-nine), and they tend to be less religious. The racial composition is about what you’d expect. Here are our numbers compared with the generic “American Internet User” breakout from Quantcast, the major online audience measurement firm—it’s like Nielsen for the net.

Going one demographic level deeper, OkCupid users are, if anything, more urban, more educated, and more progressive than the nation at large. The site’s biggest markets by far are places like New York, San Francisco, Los Angeles, Boston, and Seattle. Eighty-five percent of the users have gone to college. Self-described liberals outnumber self-described conservatives more than two to one. There is a broad, site-wide ethos of open-mindedness. And an unintentionally hilarious 84 percent of users answer this match question …

Would you consider dating someone who has vocalized a strong negative bias toward a certain race of people?

in the absolute negative (choosing “No” over “Yes” and “It Depends”). In light of the previous data, that means 84 percent of people on OkCupid would not consider dating someone on OkCupid.

‹ Prev Next ›