Dataclysm: Who We Are (When We Think No One's Looking)
Page 2
The red chart is centered barely a quarter of the way up the scale; only one guy in six is “above average” in an absolute sense. Sex appeal isn’t something commonly quantified like this, so let me put it in a more familiar context: translate this plot to IQ, and you have a world where the women think 58 percent of men are brain damaged.
Now, the men on OkCupid aren’t actually ugly—I tested that by experiment, pitting a random set of our users against a comparable random sample from a social network and got the same scores for both groups—and it turns out you get patterns like the above on every dating site I’ve seen: Tinder, Match.com, DateHookup—sites that together cover about half the single people in the United States. It just turns out that men and women perform a different sexual calculus. As Harper’s put it perfectly: “Women are inclined to regret the sex they had, and men the sex they didn’t.” You can see exactly how it works in the data. I will add: the men above must be absolutely full of regrets.
A beta curve plots what can be thought of as the outcome of a large number of coin flips—it traces the overlapping probabilities of many independent binary events. Here the male coin is fair, coming up heads (which I’ll equate with positive) just about as often as it comes up tails. But in our data we see that the female one is weighted; it turns up heads only once every fourth flip. A large number of natural processes, including the weather, can be modeled with betas, and thanks to some weather bug’s obsessive archiving, I was able to compare our person-to-person ratings to historical climate patterns. The male outlook here is very close to the function that predicts cloud cover in New York City. The female psyche, by the same metric, dwells in a place slightly darker than Seattle.
We’ll follow this thread through the first of Dataclysm’s three broad subjects: the data of people connecting. Sex appeal—how it changes and what creates it—will be our point of departure. We’ll see why, technically, a woman is over the hill at twenty-one and the importance of a prominent tattoo, but we’ll soon move beyond connections of the flesh. We’ll see what tweets can tell us about modern communication, and what friendships on Facebook can say about the stability of a marriage. Profile pictures are both a boon and a curse on the Internet: they turn almost every service (Facebook, job sites, and, of course, dating) into a beauty contest. We’ll take a look at what happens when OkCupid removes them for a day and just hopes for the best. Love isn’t blind, though we find evidence it should be.
Part 2 then looks at the data of division. We’ll begin with a close look at that prime human divide, race—a topic we can now address at the person-to-person level for the first time. Our privileged data exposes attitudes that most people would never cop to in public, and we’ll see that racial bias is not only strong but consistent—repeated almost verbatim (well, numeratim), from site to site. Racism can be an interior thing too—just one man, his prejudice, and a keyboard. We’ll see what Google Search has to say about the country’s most hated word—and what that word has to say about the country. We’ll move on to explore the divisiveness of physical beauty with a data set thousands of times more powerful than anything previously available. Ugliness has startling social costs that we are finally able to quantify. From there, we’ll see what Twitter reveals about our impulse to anger. The service allows people to stay connected up to the minute; it can drive them apart just as quickly. The collaborative rage that it enables brings a new violence to that most ancient of human gatherings: the mob. We’ll see if it can provide a new understanding, as well.
By the book’s third section, we will have seen the data of two people interacting, for better and for worse; here we will look at the individual alone. We’ll explore how ethnic, sexual, and political identity is expressed, focusing on the words, images, and cultural markers people choose to represent themselves. Here are five of the phrases most typical of a white woman:
my blue eyes
red hair and
four wheeling
country girl
love to be outside
Haiku by Carrie Underwood, or data? You make the call! We’ll explore people’s public words. We’ll also see how people speak and act in private, with an eye toward the places where labels and action diverge: bisexual men, for example, challenge our ideas of neat identity. Next, we’ll draw on a wide range of sources—Twitter, Facebook, Reddit, even Craigslist—to see ourselves in our homes, both physically and otherwise. And we’ll conclude with the natural question about a book like this: how does a person maintain his privacy in a world where these explorations are possible?
Throughout, we’ll see that the Internet can be a vibrant, brutal, loving, forgiving, deceitful, sensual, angry place. And of course it is: it’s made of human beings. However, bringing all this information together, I became acutely aware that not everyone’s life is captured in the data. If you don’t have a computer or a smartphone, then you aren’t here. I can only acknowledge the problem, work around it, and wait for it to go away.
I will say in the meantime that the reach of sites like Twitter and Facebook, and even my dating data, is surprisingly thorough. If you don’t use many of these services yourself, this is something you might not appreciate. Some 87 percent of the United States is online, and that number holds across virtually all demographic boundaries. Urban to rural, rich to poor, black to Asian to white to Latino, all are connected. Internet adoption is lower (around 60 percent) among the very old and the undereducated, which is why I drew my “age line” well short of old age in these pages—at fifty—and why I don’t address education at all. More than 1 out of every 3 Americans access Facebook every day. The site has 1.3 billion accounts worldwide. Given that roughly a quarter of the world is under age fourteen, that means that something like 25 percent of adults on Earth have a Facebook account. The dating sites in Dataclysm have registered some 55 million American members in the last three years—as I said above, that’s one account for every two single people in the country. Twitter is an especially interesting demographic case. It’s a glitzy tech success story, and the company is almost single-handedly gentrifying a large swath of San Francisco. But the service itself is fundamentally populist, both in the “openness” of its platform and in who chooses to use it. For example, there’s no significant difference in use by gender. People with only a high school education level tweet as much as college graduates. Latinos use the service as much as whites, and blacks use it twice as much. And then, of course, there’s Google. If 87 percent of Americans use the Internet, 87 percent of them have used Google.
These big numbers don’t prove I have the complete picture of anything, but they at least suggest that such a picture is coming. And in any event the perfect should not be the enemy of the better-than-ever-before. The data set we’ll work with encompasses thousands of times more people than a Gallup or Pew study; that goes without saying. What’s less obvious is that it’s actually much more inclusive than most academic behavioral research.
It’s a known problem with existing behavioral science—though it’s seldom discussed publicly—that almost all of its foundational ideas were established on small batches of college kids. When I was a student, I got paid like $25 to inhale a slightly radioactive marker gas for an hour at Mass General and then do some kind of mental task while they took pictures of my brain. It won’t hurt you, they said. It’s just like spending a year in an airplane, they said. No big deal, they said. What they didn’t say—and what I didn’t realize then—was that as I was lying there a little hungover in some kind of CAT-scanner thing, reading words and clicking buttons with my foot, I was standing in for the typical human male. My friend did the study, too. He was a white college kid just like me. I’m willing to bet most of the subjects were. That makes us far from typical.
I understand how it happens: in person, getting a real representative data set is often more difficult than the actual experiment you’d like to perform. You’re a professor or postdoc who wants to push forward, so you take what’s called a “conveni
ence sample”—and that means the students at your university. But it’s a big problem, especially when you’re researching belief and behavior. It even has a name. It’s called WEIRD research: white, educated, industrialized, rich, and democratic. And most published social research papers are WEIRD.1
Several of these problems plague my data, too. It will be a while still before digital data can scratch “industrialized” all the way off the list. But because tech is often seen as such an “elite field”—an image that many in the industry are all too willing to encourage—I feel compelled to distinguish between the entrepreneurs and venture capitalists you see on technology’s public stages, making swiping gestures and spouting buzz talk into headset mikes, people who are usually very WEIRD indeed, from the users of the services themselves, who are very much normal. They can’t help but be, because use of these services—Twitter, Facebook, Google, and the like—is the norm.
As for the data’s authenticity, much of it is, in a sense, fact-checked because the Internet is now such a part of everyday life. Take the data from OkCupid. You give the site your city, your gender, your age, and who you’re looking for, and it helps you find someone to meet for coffee or a beer. Your profile is supposed to be you, the true version. If you upload a better-looking person’s picture as your own, or pretend to be much younger than you really are, you will probably get more dates. But imagine meeting those dates in person: they’re expecting what they saw online. If the real you isn’t close, the date is basically over the instant you show up. This is one example of the broad trend: as the online and offline worlds merge, a built-in social pressure keeps many of the Internet’s worst fabulist impulses in check.
The people using these services, dating sites, social sites, and news aggregators alike, are all fumbling their way through life, as people always have. Only now they do it on phones and laptops. Almost inadvertently, they’ve created a unique archive: databases around the world now hold years of yearning, opinion, and chaos. And because it’s stored with crystalline precision it can be analyzed not only in the fullness of time, but with a scope and flexibility unimaginable just a decade ago.
I have spent several years gathering and deciphering this data, not only from OkCupid, but from almost every other major site. And yet I’ve never quite been able to get over a nagging doubt, which, given my Luddite sympathies, pains me all the more: writing a book about the Internet feels a lot like making a very nice drawing about the movies. Why bother? That’s the question of my dark hours.
There’s this great documentary about Bob Dylan called Dont Look Back that I watched a bunch back in college; my best friend, Justin, was studying film. Somewhere in the movie, at an after-party, Bob gets into an argument with a random guy about who did or who did not throw some glass thing in the street. They’re both clearly drunk. The climax of the confrontation is this exchange, and it’s stuck with me now for fifteen years:
DYLAN: I know a thousand cats who look just like you and talk just like you.
GUY AT PARTY: Oh, fuck off. You’re a big noise. You know?
DYLAN: I know it, man. I know I’m a big noise.
GUY AT PARTY: I know you know.
DYLAN: I’m a bigger noise than you, man.
GUY AT PARTY: I’m a small noise.
DYLAN: Right.
And then someone breaks it up so they can all talk poetry. It’s that kind of night. But here’s the thing: rock star or no, big noises have been the sound of mankind so far. Conquerors, tycoons, martyrs, saviors, even scoundrels (especially scoundrels!)—their lives are how we’ve told our larger story, how we’ve marked our progression from the banks of a couple of silty rivers to wherever we are now. From Pharaoh Narmer in BCE 3100, the first living man whose name we still know, to Steve Jobs and Nelson Mandela—the heroic framework is how people order the world. Narmer was first on an ancient list of kings. The scribes have changed, but that list has continued on. I mean, the 1960s, power to the people and so on, is the perfect example: that’s the era of Lennon and McCartney, Dylan, Hendrix, not “Guy at Party.” Above all, Everyman’s existence hasn’t been worth recording, apart from where it intersects with a legend’s.
But this asymmetry is ending; the small noise, the crackle and hiss of the rest of us, is finally making it to tape. As the Internet has democratized journalism, photography, pornography, charity, comedy, and so many other courses of personal endeavor, it will, I hope, eventually democratize our fundamental narrative. The sound is inchoate now, unrefined. But I’m writing this book to bring out what faint patterns I, and others, detect. This is the echo of the approaching train in ears pressed to the rail. Data science is far from perfect—there’s selection bias and many other shortcomings to understand, acknowledge, and work around. But the distance between what could be and what is grows shorter every day, and that final convergence is the day I’m writing to.
I know there are a lot of people making big claims about data, and I’m not here to say it will change the course of history—certainly not like internal combustion did, or steel—but it will, I believe, change what history is. With data, history can become deeper. It can become more. Unlike clay tablets, unlike papyrus, unlike paper, newsprint, celluloid, or photo stock, disk space is cheap and nearly inexhaustible. On a hard drive, there’s room for more than just the heroes. Not being a hero myself, in fact, being someone who would most of all just like to spend time with his friends and family and live life in small ways, this means something to me.
Now, as much as I’d like me and you and WhoBeefed81 to be right there on the page with the president when future works treat this decade, I imagine everyday people will always be more or less nameless, as indeed they are even here. The best data can’t change that. But we all will be counted. When in ten years, twenty, a hundred, someone takes the temperature of these times and wants to understand changes—wants to see how legalizing gay marriage both drove and reflected broader acceptance of homosexuality or how village society in Asia was uprooted, then created again, within its large urban centers—inside that story, even comprising its very bones, will be data from Facebook, Twitter, Reddit, and the like. And if not, our putative writer will have failed.
I’ve tried to capture all this with my mash-up title. Kataklysmos is Greek for the Old Testament Flood; that’s how the word “cataclysm” came to English. The allusion has dual resonance: there is, of course, the data as unprecedented deluge. What’s being collected today is so deep it verges on bottomless; it’s easily forty days and forty nights of downpour to that old handful of rain. But there’s also the hope of a world transformed—of both yesterday’s stunted understanding and today’s limited vision gone with the flood.
This book is a series of vignettes, tiny windows looking in on our lives—what brings us together, what pulls us apart, what makes us who we are. As the data keeps coming, the windows will get bigger, but there’s plenty to see right now, and the first glimpse is always the most thrilling. So to the sills, I’ll boost you up.
1 An article in Slate noted: “WEIRD subjects, from countries that represent only about 12 percent of the world’s population, differ from other populations in moral decision making, reasoning style, fairness, even things like visual perception. This is because a lot of these behaviors and perceptions are based on the environments and contexts in which we grew up.”
1. Wooderson’s Law
2. Death by a Thousand Mehs
3. Writing on the Wall
4. You Gotta Be the Glue
5. There’s No Success Like Failure
1.
Wooderson’s Law
Up where the world is steep, like in the Andes, people use funicular railroads to get where they need to go—a pair of cable cars connected by a pulley far up the hill. The weight of the one car going down pulls the other up; the two vessels travel in counterbalance. I’ve learned that that’s what being a parent is like. If the years bring me low, they raise my daughter, and, please, so be it. I surrender glad
ly to the passage, of course, especially as each new moment gone by is another I’ve lived with her, but that doesn’t mean I don’t miss the days when my hair was actually all brown and my skin free of weird spots. My girl is two and I can tell you that nothing makes the arc of time more clear than the creases in the back of your hand as it teaches plump little fingers to count: one, two, tee.
But some guy having a baby and getting wrinkles is not news. You can start with whatever the Oil of Olay marketing department is running up the pole this week—as I’m writing it’s the idea of “color correcting” your face with a creamy beige paste that is either mud from the foothills of Alsace or the very essence of bullshit—and work your way back to myths of Hera’s jealous rage. People have been obsessed with getting older, and with getting uglier because of it, for as long as there’ve been people and obsession and ugliness. “Death and taxes” are our two eternals, right? And depending on the next government shutdown, the latter is looking less and less reliable. So there you go.