Dataclysm: Who We Are (When We Think No One's Looking)

Home > Other > Dataclysm: Who We Are (When We Think No One's Looking) > Page 4
Dataclysm: Who We Are (When We Think No One's Looking) Page 4

by Christian Rudder


  You can see this in the profile ratings on OkCupid. Because the site’s rating system is 5 stars, the votes have more depth than just a yes or a no. People give degrees of opinion, and that gives us room to explore. To show this finding, we’ll have to go on a short mathematical journey. These kinds of exercises are what make data science work. To put together puzzles, you have to lay out all the pieces and then just start trying things. In the absence of careful sifting, reduction, and parsimony, very little just “jumps out at you” from terabytes of raw data.

  Consider a group of women with approximately the same attractiveness, let’s just say the ones rated in the middle:

  Now imagine a woman in that group and think of the many different votes men could’ve given her—basically think about how she ended up in the middle. There are thousands of possibilities; here are just a few I made up, combinations of 1s, 2s, 3s, 4s, and 5s, which all come to an average of 3:

  As you might’ve noticed, the vote patterns I’ve chosen get more polarized as they go from Pattern A to Pattern E. Each row still averages out to that same central “3,” but they express that average in different ways. Pattern A is the embodiment of consensus. There, the men who cast the votes have spoken in perfect unison: this woman is exactly in the middle. But by the time we get to the bottom of the table, the overall average is still centered, yet no single individual actually holds that central opinion. Pattern E shows the most extreme possible path to a middling average: for every man awarding our theoretical woman a “1,” someone else gives her a “5,” and the total result comes out to a “3” almost in spite of itself. That’s the John Waters way.

  These patterns exemplify a mathematical concept called variance. It’s a measure of how widely data is scattered around a central value. Variance goes up the further the data points fall from the average; in the table above, it is highest in Pattern E. One of the most common applications of variance is to weigh volatility (and therefore risk) in financial markets. Consider these two companies:

  Both returned 10 percent for the year, but they are very different investments. Associated Widgets experienced large swings in value throughout the year, while Widgets Inc. grew little by little, showing consistent gains each month. Computing the variance allows analysts to capture this distinction in one simple number, and all other things being equal, investors much prefer the low score of that pattern on the right. Same return, fewer heart palpitations. Of course, when it comes to romance, heart palpitations are the return, and that gets to the crux of it. It turns out that variance has almost as much to do with the sexual attention a woman gets as her overall attractiveness.

  In any group of women who are all equally good-looking, the number of messages they get is highly correlated to the variance: from the pageant queens to the most homely women to the people right in between, the individuals who get the most affection will be the polarizing ones. And the effect isn’t small—being highly polarizing will in fact get you about 70 percent more messages. That means variance allows you to effectively jump several “leagues” up in the dating pecking order—for example, a very low-rated woman (20th percentile) with high variance in her votes gets hit on about as much as a typical woman in the 70th percentile.

  Part of that is because variance means, by definition, that more people like you a lot (as well as dislike you a lot). And those enthusiastic guys—let’s just call them the fanboys—are the ones who do most of the messaging. So by pushing people toward the high end (the 5s), you get more action.

  But the negative votes themselves are part of the story, too. They drive some of the attention on their own. For example, the real patterns exemplified by C and D below get about 10 percent more messages than the ones shown in A and B, even though the top two women are rated far better overall:

  I’ve been talking about messages as if they’re an end unto themselves, but on a dating site, messages are the precursor to outcomes like in-depth conversations, the exchange of contact information, and eventually in-person meetings. People with higher variance get more of all these things, too. So, for example, woman D above would have about 10 percent more conversations, 10 percent more dates, and, likely, 10 percent more sex than woman A, even though in terms of her absolute rating she’s much less attractive.

  Moreover, the men giving out those 1s and 2s are not themselves hitting on the women—people practically never contact someone they’ve rated poorly.3 It’s that having haters somehow induces everyone else to want you more. People not liking you somehow brings you more attention entirely on its own. And, yes, in his underground castle, Karl Rove smiles knowingly, petting an enormous toad.

  It only adds to the mystery of the phenomenon that OkCupid doesn’t publish raw attractiveness scores (or a variance number, of course) for anyone on the site. Nobody is consciously making decisions based on this data. But people have a way of feeling the math behind things, whether they’re aware of it or not, and here’s what I think is going on. Suppose a guy is attracted to a woman he knows is unconventional-looking. Her very unconventionality implies that some other men are likely turned off; it means less competition. Having fewer rivals increases his chances of success. I can imagine our man browsing her profile, circling his cursor, thinking to himself: I bet she doesn’t meet many guys who think she’s awesome. In fact, I’m actually into her for her quirks, not in spite of them. This is my diamond in the rough, and so on. To some degree, her very unpopularity is what makes her attractive to him. And if our browsing guy was at all on the fence about whether to actually introduce himself, this might make the difference.

  Looking at the phenomenon from the opposite angle—the low-variance side—a relatively attractive woman with consistent scores is someone any guy would consider conventionally pretty. And she therefore might seem to be more popular than she really is. Broad appeal gives the impression that other guys are after her, too, and that makes her incrementally less appealing. Our interested but on-the-fence guy moves on.

  This is my theory at least. But the idea that variance is a positive thing is fairly well established in other arenas. Social psychologists call it the “pratfall effect”—as long as you’re generally competent, making a small, occasional mistake makes people think you’re more competent. Flaws call out the good stuff all the more. This need for imperfection might just be how our brains are put together. Our sense of smell, which is the most connected to the brain’s emotional center, prefers discord to unison. Scientists have shown this in labs, by mixing foul odors with pleasant ones, but nature, in the wisdom of evolutionary time, realized it long before. The pleasant scent given off by many flowers, like orange blossoms and jasmine, contains a significant fraction (about 3 percent) of a protein called indole. It’s common in the large intestine, and on its own, it smells accordingly. But the flowers don’t smell as good without it. A little bit of shit brings the bees. Indole is also an ingredient in synthetic human perfumes.

  You can see a public implementation, as it were, of the OkCupid data in the rarefied world of modeling. The women are all professionally gorgeous—5 stars out of 5, of course. But even at that high level it’s still about distinguishing yourself through imperfection. Cindy Crawford’s career took off after she stopped covering her mole. Linda Evangelista had the severe hair—you can’t say it made her prettier, but it did make her far more interesting. Kate Upton, at least according to the industry standard, has a few extra pounds. Pulling a few examples from the data set, perhaps ones that are more relatable than swimsuit models, will help you see how it works for a normal person. Here are six women, all with middle-of-the-road overall scores, but who tend to get extreme reactions either way: lots of Yes, lots of No, but very little Meh:

  Thanks to each of them for having the confidence to agree to be displayed and discussed here. What you see in the array is what you get throughout the corpus. These are people who’ve purposefully abandoned the middle road: with body art, a snarky expression, or by eating a grilled cheese like a
badass. And you find many relatively normal women with an unusual trait: like the center woman in the bottom row, whose blue hair you can’t see in black and white. And you especially see women who’ve chosen to play up their particular asset/liability. If you can pull off, say, a 3.3 rating despite the extra pounds or the people who hate tattoos or whatever, then, literally, more power to you.

  So at the end of it, given that everyone on Earth has some kind of flaw, the real moral here is: be yourself and be brave about it. Certainly trying to fit in, just for its own sake, is counterproductive. I know this is dangerously close to the kind of thing that gets put on a quilt, and quilts, being the PowerPoint presentations of an earlier time, are the opposite of science. It also sounds a lot like the advice a mother gives, along with a pat on the head, to her big-nosed and brace-faced son when he’s fourteen and can’t figure out why he isn’t more popular. But either way, there it is, in the numbers. Like I said, people can feel the math behind things, especially, thankfully, moms. I just wish she’d told me that by ninth grade bears aren’t cool.

  1 Waters on film: “To me, bad taste is what entertainment is all about. If someone vomits while watching one of my films, it’s like getting a standing ovation.”

  2 These pages on Reddit are called subreddits. I’ll explain the site and its nuances in more detail later.

  3 Only 0.2 percent of the messages on the site are sent by users to a person to whom they awarded fewer than 3 stars.

  3.

  Writing on the Wall

  Nostalgia used to be called mal du Suisse—the Swiss sickness. Their mercenaries were all over Europe and were apparently notorious for wanting to go home. They would get misty and sing shepherd ballads instead of fighting, and when you’re the king of France with Huguenots to burn, songs won’t do. The ballads were banned. In the American Civil War nostalgia was such a problem it put some 5,000 troops out of action, and 74 men died of it—at least according to army medical records. Given the circumstances, being sad to death is actually kind of understandable, but then again, this was also the time of leeches and the bonesaw, so who knows what was really going on. It’s interesting to think that in those days, many of the people who left home did so to go to war—much of the early literature on nostalgia, which was seen then as a bona fide disease, mentions soldiers. In that sepia-toned way I can’t help but think about the past, I like to imagine scientists in 1863, on either side of the Potomac, working furiously against the clock to develop the ultimate war-ending superweapon: high school yearbooks.

  I actually don’t even know if they have high school yearbooks anymore. It’s hard to see why you’d need one now that Facebook’s around, although according to the company’s last quarterly report, people under eighteen aren’t using Facebook as much as they used to. So maybe the kids need the printed copy again, I don’t know.1 But however teenagers are staying in touch—whether it’s through Snapchat or WhatsApp or Twitter—I’m positive they’re doing it with words. Pictures are part of the appeal of all of these services, obviously, but you can only say so much without a keyboard. Even on Instagram, the comments and the captions are essential—the photo after all is just a few inches square. But the words are the words are the words. They’re still how feelings come across and how connections are made.

  In fact, for all the hand-wringing over technology’s effect on our culture, I am certain that even the most reticent teenager in 2014 has written far more in his life than I or any of my classmates had back in the early ’90s. Back then, if you needed to talk to someone you used the phone. I wrote a few stiff thank-you notes and maybe one letter a year. The typical high school student today must surpass that in a morning. The Internet has many regrettable sides to it, but that’s one thing that’s always stood it in good stead with me: it’s a writer’s world. Your life online is mediated through words. You work, you socialize, you flirt, all by typing. I honestly feel there’s a certain epistolary, Austenian grandness to the whole enterprise. No matter what words we use or how we tap out the letters, we’re writing to one another more than ever. Even if sometimes

  dam gerl

  is all we have to say.

  Major Sullivan Ballou was one of the soldiers in the Union army, on the Potomac, suffering, and homesick. Early in Ken Burns’s The Civil War, a narrator reads his farewell letter to his wife, to his “very dear Sarah,” and it’s a moving and important moment in the film. The Major was writing from camp before the first large battle of the war, and he was mortally wounded days later. His words were the last his family would ever hear from him, and they drove home the greater sorrow the nation would face in the years to come. Because of the exposure, the Ballou letter has become one of the most famous ever written—when I search for “famous letter,” Google lists it second. It’s a beautiful piece of writing, but think of all the other letters that will never be read aloud, that were burned, lost in some shuffle, or carried off by the wind, or that just moldered away.

  Today we don’t have to rely on the lucky accident of preservation to know what someone was thinking or how he talked, and we don’t need the one to stand in for the many. It’s all preserved, not just one man to one wife before one battle, but all to all, before and after and even in the middle of each of our personal battles. You can find readings of the Ballou letter on YouTube, and many of the comments are along the lines of “They just don’t make them like that anymore.” That’s true. But what they, or rather we, are making offers a richness and a beauty of a different kind: a poetry not of lyrical phrases but of understanding. We are at the cusp of momentous change in the study of human communication and what it tries to foster: community and personal connection.

  When you want to learn about how people write, their unpolished, unguarded words are the best place to start, and we have reams of them. There will be more words written on Twitter in the next two years than contained in all books ever printed. It’s the epitome of the new communication: short and in real time. Twitter was, in fact, the first service not only to encourage brevity and immediacy, but to require them. Its prompt is “What’s happening?” and it gives users 140 characters to tell the world. And Twitter’s sudden popularity, as much as its sudden redefinition of writing, seemed to confirm the fear that the Internet was “killing our culture.” How could people continue to write well (and even think well) in this new confined space—what would become of a mind so restricted? The actor Ralph Fiennes spoke for many when he said, “You only have to look on Twitter to see evidence of the fact that a lot of English words that are used, say, in Shakespeare’s plays or P. G. Wodehouse novels … are so little used that people don’t even know what they mean now.”

  Even basic analysis shows that language on Twitter is far from a degraded form. Below, I’ve compared the most common words on Twitter against the Oxford English Corpus—a collection of nearly 2.5 billion words of modern writing of all kinds—journalism, novels, blogs, papers, everything. The OEC is the canonical census of the current English vocabulary. I’ve charted only the top 100 words out of the tens of thousands that people use, which may seem like a paltry sample, but roughly half of all writing is formed from these words alone (both on Twitter and in the OEC). The most important thing to notice on Twitter’s list is this: despite the grumblings from the weathered sentinels atop Fortress English, there are only two “netspeak” entries—rt, for “retweet” and u, for “you”—in the top 100. You’d think that contractions, grammatical or otherwise, would be staples of a form that only allows a person 140 characters, but instead people seem to be writing around the limitation rather than stubbornly through it. Second, when you calculate the average word length of the Twitter list, it’s longer than the OEC’s: 4.3 characters to 3.4. And look beyond length to the content of the Twitter vocabulary. I’ve highlighted the words unique to it in order to make the comparison easier:

  OEC Twitter OEC Twitter

  1 the to 51 when back

  be a make an

  to i can see />
  of the like more

  and and time by

  a in no today

  in you just twitter

  that my him or

  have for know as

  10 I on 60 take make

  it of people who

  for it into got

  not me year here

  on this your want

  with with good need

  he at some happy

  as just could too

  you so them u

  do be see best

  20 at rt 70 other people

  this out than some

  but that then they

  his have now life

  by your look there

  from all only think

  they up come going

  we love its why

  say do over he

  her what think really

  30 she like 80 also way

  or not back come

  an get after much

  will no use only

  my good two off

  one but how still

  all new our right

  would can work night

  there if first home

  their day well say

  40 what now 90 way great

  so time even never

  up from new work

  out go want would

  if how because last

  about we any first

  who will these over

  get one give take

  which about day its

  go know most better

  50 me when 100 us them

  While the OEC list is rather drab, lots of helpers and modifiers—workmanlike language to get you to some payoff noun or verb—on Twitter, there’s no room for functionaries; every word’s gotta be boss. So you see vivid stuff like:

  love

  happy

  life

  today

 

‹ Prev