Dataclysm: Who We Are (When We Think No One's Looking)

Home > Other > Dataclysm: Who We Are (When We Think No One's Looking) > Page 20
Dataclysm: Who We Are (When We Think No One's Looking) Page 20

by Christian Rudder


  half the single people in the United States Specifying the reach of the dating data I have was a challenge. I’ve strived to do so in broad, easy-to-grasp terms because, unlike Facebook or Twitter, I know much of my reading audience has never used a dating site. If you’ve been married or in a relationship since the late ’90s or before, you have never needed online dating. According to the 2011 Census numbers, there are 103 million single people ages fifteen to sixty-four in the United States—that counts everyone who isn’t legally married, including many people who are actually in long-term relationships and nearly every gay person. Together, Tinder, OkCupid, DateHookup, and Match.com registered 57 million US accounts from 2011 to 2013, and 23 million in the last of those three years alone. “Half” is my approximation of 57/103, minus the 10 to 15 percent wastage in overlap and duplicate accounts.

  “Women are inclined to regret” This quote is from the “Findings” section of the February 2014 issue of Harper’s by Rafil Kroll-Zaidi.

  A beta curve plots My data researcher, Tom Quisel, helped me put the binomial nature of beta curves into simple terms. He also pointed out that they’re used to model weather, and ran the comparisons to the by-city patterns on weatherbug.com.

  Some 87 percent of the United States is online See Susannah Fox and Lee Rainie, “Summary of Findings,” Pew Research Internet Project, Pew Research Center, February 27, 2014, pewinternet.org/2014/02/27/summary-of-findings-3/.

  that number holds … For example, Internet use among white, African American, and Hispanic Americans is 85, 81, and 83 percent, respectively. One can only assume adoption among Asian Americans is similar. Adoption is above 80 percent for all age groups, save people sixty-five and older. Susannah Fox and Lee Rainie, “Internet Users in 2014,” Pew Research Internet Project, Pew Research Center, February 27, 2014, pewinternet.org/files/2014/02/12-internet-users-in-2014.jpg.

  More than 1 out of every 3 Americans access Facebook Facebook reported 128 million US users in August 2013. Facebook had at least 1.26 billion users worldwide in September 2013. World and US population statistics are from Wikipedia. See expandedramblings.com/index.php/by-the-numbers-17-amazing-facebook-stats/.

  fundamentally populist This is something like common knowledge among people who study social media adoption beyond the Google Glasshole/Technocrat use case. See Pew Research Center’s “Demographics of Key Social Networking Platforms” (2013). The report shows no statistically significant difference in rates of Twitter use between the “high school grad or less” and “College +” educational cohorts (coming in at 17 percent and 18 percent, respectively). Pew surveys a random cross-section of Americans eighteen years old or older, so very few of the “high school grad or less” cohort are that way simply because they’re still in high school. By ethnicity, Pew reports adoption rates of 29 percent among blacks and 16 percent among both whites and Hispanics. The full report, by Maeve Duggan and Aaron Smith, is here: pewinternet.org/2013/12/30/demographics-of-key-social-networking-platforms/.

  It’s called WEIRD research This fact and my general take on the phenomenon are adapted from “Psychology Is WEIRD,” by Bethany Brookshire, in Slate. See also “The Roar of the Crowd,” The Economist, May 24, 2012, economist.com/node/21555876.

  Pharaoh Narmer As you can imagine, this is up for debate, though Narmer, also known as Serket, is a defensible choice. In earlier drafts I had Gilgamesh, the Akkadian hero, in this place because J. M. Roberts, in his History of the World (New York: Oxford University Press, 1993), chooses Gilgamesh. I eventually went with Narmer because his life is dated several centuries earlier, and he seemed to me as likely to have actually lived. Yahoo! Answers also mentions Elvis Presley.

  Chapter 1: Wooderson’s Law

  This isn’t survey data This is a good place to point out that for anyone’s attractiveness to have been considered in my analysis in this book, that person needed to have received votes from at least twenty-five other people. For something as idiosyncratic as attraction, I felt an average score comprising fewer than twenty-five votes wasn’t reliable.

  per the US Census These numbers are from the US Census Bureau’s “Marital Status of People 15 Years and Over, by Age, Sex, Personal Earnings, Race, and Hispanic Origin, 2011.”

  Chapter 2: Death by a Thousand Mehs

  “Beauty is looks you can never forget” John Waters, Shock Value: A Tasteful Book About Bad Taste (Philadelphia: Running Press, 2005), p. 128.

  concept called variance I used standard deviation to measure variance throughout this chapter.

  the “pratfall effect” A Google search for “pratfall effect” will yield many examples. I particularly relied on the précis “The Positive Effect of Negative Information” by Bill Snyder and the original paper he summarizes, “When Blemishing Leads to Blossoming: The Positive Effect of Negative Information,” by Danit Ein-Gar, Zakary Tormala, and Shiv Tormala, Journal of Consumer Research 38, no. 5 (2012): 846–59.

  Our sense of smell For this passage, I relied on Fabian Grabenhorst et al., “How Pleasant and Unpleasant Stimuli Combine in Different Brain Regions: Odor Mixtures,” Journal of Neuroscience 27, no. 49 (2007): 13532–40, doi: 10.1523/JNEUROSCI.3337–07.2007. Wikipedia’s “Indole” entry describes its “intense fecal smell.” For more on indole’s role in perfumes and in naturally occurring flower scents, see, as I did, perfumeshrine.blogspot.com/2010/05/jasmine-indolic-vs-non-indolic.html.

  Here are six women We received these permissions using a double-blind system, to protect user privacy. I submitted criteria (women, high variance scores, midrange overall attractiveness) to OkCupid’s data team. The data team generated a list of possible names, which they passed on to our admin. She then had a list of names, with no other information attached, and was told to contact them for blanket photo authorization. (We commonly receive press requests for user photos, so this type of outreach isn’t unusual.) A photo and its unique attributes were only connected once permission was granted.

  Chapter 3: Writing on the Wall

  Nostalgia used to be called Because the phenomenon is so interesting (and unexpected) and one link leads to another, my sources for this passage were many. These I drew on directly:

  “Dying to Go Home,” by Jackie Rosenhek, Doctor’s Review, December 2008, doctorsreview.com/history/dying-to-go-home/.

  “Beware Social Nostalgia,” by Stephanie Coontz, New York Times, May 19, 2013, nytimes.com/2013/05/19/opinion/sunday/coontz-beware-social-nostalgia.html.

  “When Nostalgia Was a Disease,” by Julie Beck, The Atlantic, August 2013, theatlantic.com/health/archive/2013/08/when-nostalgia-was-a-disease/278648/.

  The “Nostalgia” entry on qi.com: qi.com/infocloud/nostalgia.

  people under eighteen aren’t using Facebook The earnings call in question reviewed Facebook’s fourth-quarter performance, 2013. See Joanna Stern, “Teens Are Leaving Facebook and This Is Where They Are Going,” ABCNews, October 31, 2013, abcnews.go.com/story?id=20739310.

  Major Sullivan Ballou The basic facts surrounding the letter can be found here: pbs.org/civilwar/war/ballou_letter.html. Though the letter was never mailed, it was included with Ballou’s belongings and returned to his family after his death.

  There will be more words written on Twitter I calculate this as follows: 129,864,880 books have been written, at least according to Google. That number is laughably precise; however, given that they have already logged 30 million of them, and indexing things is their business, their guess should be considered a plausible estimate. See Ben Parr, “Google: There Are 129,864,880 Books in the Entire World,” Mashable, August 5, 2010, mashable.com/2010/08/05/number-of-books-in-the-world/.

  According to Amazon, the median length of a novel is 64,000 words. Since it’s very likely that the median and mean are close here, I’m comfortable using it as an average. I don’t think novels are necessarily longer or shorter than other books. See Gabe Habash, “The Average Book Has 64,500 Words,” PWxyz, March 6, 2012, blogs.publishersweekly.com/blogs/PWxyz/2012/03/06
/the-average-book-has-64500-words.

  These two numbers together yield 8,311,352,320,000 words ever in print.

  Twitter reported 500 million tweets a day in August 2013. See blog.twitter.com/2013/new-tweets-per-second-record-and-how.

  I estimate that each tweet has 20 words. So at 10 billion words a day, it will take Twitter 831 days (2.3 years) to surpass all of printed literature in volume. This is obviously meant to be an approximation, and a conservative one at that. In all likelihood, Twitter will do it much faster, since the rate of tweets per day is increasing rapidly.

  “You only have to look on Twitter” Mr. Fiennes’s quote was covered extensively. See Lucy Jones, “Ralph Fiennes Blames Twitter for ‘Eroding’ Language,” Telegraph, October 27, 2012, telegraph.co.uk/technology/twitter/8853427/Ralph-Fiennes-blames-Twitter-for-eroding-language.html.

  Even basic analysis shows Here and in all my own Twitter analysis I use the tweets and followers generated by a representative corpus of 1.2 million accounts, collected at random by my research team.

  The OEC is the canonical census More on the OEC and its most common words can be found here: en.wikipedia.org/wiki/Most_common_words_in_English.

  The OEC lists only lemmas—that is, the base word root of a related lexical pattern. For example, it counts have for had, having, has, and so on. I chose not to do this in my Twitter research. Though my choice makes comparing the lists directly more difficult, I preferred to present the data in as raw a state as possible.

  Mark Liberman Professor Liberman’s blog Language Log (languagelog.ldc.upenn.edu/nll/) contains a trove of interesting textual analysis. See “Up in UR Internets, Shortening All the Words,” October 28, 2011, languagelog.ldc.upenn.edu/nll/?p=3532, for his discussion of the Fiennes quote in particular.

  A team at Arizona State The Twitter textual analysis in the rest of this paragraph is drawn from “Dude, srsly?: The Surprisingly Formal Nature of Twitter’s Language,” by Yuheng Hu, Kartik Talamadupula, and Subbarao Kambhampati, paper presented at the seventh annual International AAAI Conference on Weblogs and Social Media, Cambridge, Massachusetts, July 8–11, 2013, aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6139.

  Here I’ve excerpted an early attempt The table and the subsequent discussion of the word “tribes” on Twitter are drawn from “Word Usage Mirrors Community Structure in the Online Social Network Twitter,” by John Bryden, Sebastian Funk, and Vincent AA Jansen, EPJ Data Science 2, no. 3 (2013). I also draw from their “Additional Material” containing raw community word lists not used in the paper itself. The full paper, along with links to the additional material, can be found here: epjdatascience.com/content/2/1/3.

  This body of data has created a new field This method of mining Google Books for cultural trends was first proposed in Science in the article “Quantitative Analysis of Culture Using Millions of Digitized Books,” by Jean-Baptiste Michel et al., Science 331, no. 6014 (2011): 176–82, doi:10.1126/science.1199644.

  My graph of food words over time is a reproduction of their exploration of the same terms in that paper. My graph of year words over time is an adaptation of their method, rather than a reproduction. The paper references a “half-life” of memory that I was not able to reproduce. Nonetheless, the writers’ claim that “We are forgetting our past faster with each passing year” is clearly directionally correct. The paper has much more of interest than just the two charts I’ve referenced here and is worth reading in full.

  Below is a scatter chart of 100,000 messages No private messages were read by anyone in performing this analysis. The number of keystrokes and typing time are logged automatically for a sample of OkCupid’s users as part of our ongoing spam-detection software. Since I didn’t read any actual user messages, the quoted text of the three-letter message “hey” is a likelihood rather than a certainty. About 80 percent of three-letter messages on the site are “hey.” “Sup” is the next most popular, then “wow.” Given the overwhelming popularity of “hey,” and that I was making a joke, and that any of the alternatives would’ve worked just as well, I was comfortable picking “hey” in this context.

  “I’m a smoker too” This private message, presented verbatim and complete, came to my attention in a context outside this book, and I received the sender’s permission to both reprint and discuss it here.

  Chapter 4: You Gotta Be the Glue

  “social graphs” The network plots on this page and this page were generated by James Dowdell, using the same general graphic scheme used by Lars Backstrom and Jon Kleinberg in their paper “Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook,” presented at the 18th ACM Conference on Computer-Supported Cooperative Work and Social Computing, Baltimore, Maryland, February 15–19, 2014, delivery.acm.org/10.1145/2540000/2531642/p831-backstrom.pdf.

  I spent years touring in a band My band is called Bishop Allen; Justin Rice is the band’s other half. You can find our songs on Spotify, or on the nearest torrent, or on iTunes. For anyone interested, my personal recommendations are the songs “Like Castanets,” “Click Click Click Click,” “Chinatown Bus,” “Start Again,” and “Little Black Ache.”

  In 1735, Leonhard Euler Though I was familiar with Euler, the bridges problem, and their role in the genesis of graph theory from my time as a math major, I relied on Wikipedia’s “Seven Bridges of Königsberg” entry for the minutiae surrounding the problem and its solution.

  has since helped us understand A good resource for both classic and modern uses of graph theory is here: world.mathigon.org/Graph_Theory.

  Stanley Milgram Like Euler, Milgram and his work have been familiar to me for years. However, I relied on his Wikipedia entry for the details of his “Six Degrees” experiment.

  Facebook allowed us to see See “The Anatomy of the Facebook Social Graph,” by Johan Ugander et al. (arXiv preprint, 2011, arXiv: 1111.4503).

  Pixar famously put The idea was Steve Jobs’s. I first heard of this anecdote in Jonah Lehrer’s Imagine (Edinburgh, UK: Canongate, 2012). See BuzzFeed’s “Inside Steve Jobs’ Mind-Blowing Pixar Campus,” by Adam B. Vary, for more details. Vary mind-blowingly interviews Craig Payne, a senior Pixar manager: buzzfeed.com/adambvary/inside-steve-jobs-mindblowing-pixar-campus.

  “the strength of weak ties” See “The Strength of Weak Ties” by Mark S. Granovetter, American Journal of Sociology 78, no. 6 (1973): 1360–80.

  Another long-held idea in network theory Though embeddedness was first proposed by Granovetter in 1985, my remaining discussion of embeddedness and of interpersonal network theory is drawn from the primary source behind this chapter, Backstrom and Kleinberg’s “Romantic Partnerships.” I apply their heuristic to my own networks and somewhat simplify their original work for a nonacademic audience.

  an astounding 75 percent of the time Backstrom and Kleinberg define many subtly different mathematical kinds of dispersion. My number here refers to the accuracy they reported with the method they call “recursive dispersion.”

  50 percent more likely This is drawn from the following passage in Backstrom and Kleinberg’s paper: “We find that relationships on which recursive dispersion fails to correctly identify the partner are significantly more likely to transition to ‘single’ status [that is, break up] over a 60-day period. This effect holds across all relationship ages and is particularly pronounced for relationships up to 12 months in age; here the transition probability is roughly 50% greater when recursive dispersion fails to recognize the partner.”

  Have a meeting with Microsoft people This might not be broadly true of all Microsoft employees; however, the teams responsible for Microsoft’s mobile and tablet products are, in my experience, dogfooders of the first order. Windows mobile is so rare as to be especially noteworthy, so you remember it when you see it. This is a good place to point out that I am a lifelong user of Microsoft Office, and all the charts and much of the analysis in this book were done in Excel.

  Chapter 5: There’s No Succe
ss Like Failure

  one of Google’s best designers Douglas Bowman leaving Google is a famous event in tech circles. See his own post “Goodbye, Google” at stopdesign.com/archive/2009/03/20/goodbye-google.html.

  no evidence of people gaming the system It was fairly simple to unscramble a Crazy Blind Date photo; we knew this would be the case. Sure enough, about a week after launch a few hackers had built apps to de-anonymize the photos. However, these apps never caught on, mostly because they were difficult to use and even then only worked part of the time. These unscramblers were not a factor in Crazy Blind Date’s product trajectory or the data it generated. The scrambled example photo printed in the book is a stock photo, licensed from Getty Images.

  Chapter 6: The Confounding Factor

  of a certain type See, for example, “Blacks Still Dying More from Cancer Than Whites,” by Jordan Lite, Scientific American, February 2009. Also see the Sentencing Project’s “Criminal Justice Primer for the 111th Congress,” which details many depressing disparities in the sentences handed down to whites, compared to minority defendants: sentencingproject.org/doc/publications/cjprimer2009.pdf.

  conclusions like this The headline cited is from ThinkProgress.org. “Study: Black Defendants Are at Least 30% More Likely to Be Imprisoned Than White Defendants for the Same Crime,” by Inimai Chettiar, August 30, 2012, thinkprogress.org/justice/2012/08/30/770501/study-black-defendants-are-at-least-30-more-likely-to-be-imprisoned-than-white-defendants-for-the-same-crime.

  in the 97,000 results It’s a bit of a hack to get Google to give you a number here. My exact query was for “ ‘black quarterback’ −adsffsdada.” Using the minus sign with the nonsense word keeps the page from automatically returning images instead of the “about 97,000 results” text. I’m sure without the browser in front of you, this all sounds mystifying. Try it yourself if you care, and you’ll see immediately what I mean. Also, this is another example of a raw number that has changed during the course of writing this book. I’ve also gotten “89,800 results” returned to me.

 

‹ Prev