The Numerati
Page 22
75 As Robert O’Harrow Jr. writes. See No Place to Hide, by Robert O’Harrow Jr., Free Press, 2005.
Mike Henry, Clinton’s deputy campaign manager, left the race on February 13, 2008, following Clinton’s losses to Senator Barack Obama in Virginia, Maryland, and the District of Columbia.
4. BLOGGER
106 Companies and governments alike are poring over. This is happening in countless ways. Consider Michael Cavaretta. He runs a math shop at Ford. He and his team are attempting to mine the company’s vast collection of warranty claims. The big challenge is to reduce millions of documents, some of them handwritten, into math. But first the machines must figure out the writers. What do thousands of mechanics and customer service reps around the world mean when they write phrases like “squeak and squeal,” “shimmy and shake”? Are those pairs of words synonyms? Should they go into the same bucket? Do the meanings of these words vary by region? Cavaretta told me that one mechanic wrote that a car was “squealing like the pig Bubba stuck.” How does a computer make sense of that? Cavaretta’s team extracts all the knowledge it can from this vast collection before clustering the data and using statistical analysis to find patterns of problems in the cars.
119 A blog about deodorants in Iraq. Stephen Baker, Blogspotting.net, “Captive Advertising Audience at 30,000 Feet,” http://www.businessweek.com/the_thread/blogspotting/archives/2007/01/captive_adverti.html.
5. TERRORIST
124 USA Today reported. “NSA Has Massive Database of Americans’ Phone Calls,” USA Today, May 11, 2006.
125 There’s a lack of historical record. This is a problem for NASA as well. David Danks, a philosophy professor at Carnegie Mellon University, told me that NASA processes data from 40,000 different sensors on the space shuttles, much of it coming in numerous times per second. This provides sufficient data to create detailed simulations of launches. And yet during the first quarter-century of shuttle flights, there have been only two disasters. “We have a sample size of two,” he said. This makes it difficult to pick out patterns of data that point to problems.
126 Unexpected earth-shaking events. Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable, Random House, 2007.
Jerry Friedman, a statistics professor at Stanford. See The Mathematical Sciences’ Role in Homeland Security: Proceedings of a Workshop, National Academies Press, 2004.
131 Jeff Jonas, like many others. Jonas writes at length about security and privacy challenges surrounding data on his blog, http://www.jeffjonas.typepad.com/.
143 As many as 300 cameras. “Watching You Watching Me,” New Statesman, Oct. 2, 2006.
The Chinese government announced plans. “China Enacting a High-Tech Plan to Track People,” New York Times, Aug. 12, 2007.
6. PATIENT
160 “There are a zillion people following biology” For that same reason, I decided not to focus the medicine chapter on what the Numerati are doing in the vast field of genetics. But I did research the subject. One of my ideas was to figure out the genetic odds that, like my father, I would develop glaucoma and macular degeneration and eventually go blind late in my life. This question led me to the University of Iowa, where a personable doctor named Edwin Stone has built a world-class eye research operation, including the Carver Family Center for Macular Degeneration. I learned there about an experiment to decode the entire genome of a rat’s eye, which is similar—despite its beady appearance—to our own. The job for the Numerati studying the rat gene is not to find single genes that create blindness. Those are rare. Instead, the challenge is to untangle tens of millions of relationships among the genes and to map the paths of power and influence within the eye. The secrets to blindness are not found in the structure of the genome but instead in the behavior of its components. It’s like a society.
The analysis, of course, is statistical. And as I learned about it, I began to see that it’s very much like the work that goes on at Tacoda. Just as Dave Morgan was searching for the behavioral patterns of romantic-movie lovers, genetic researchers have to parse the behavior of the influential genes. What activates them? Are there stimuli coming from other genes or proteins? Which ones? In both domains, advertising and genetics, the process involves sifting through massive sets of data, looking for patterns, weighing statistics, and using probability to distinguish between a cause and a coincidence. From the point of view of the Numerati, the microscopic forces within our bodies behave more like communities, or even markets, than like components of a machine.
I’m sorry to report that I learned nothing about the chances that I would go blind, much less that genetic fixes were at hand. Instead, Dr. Stone prepared me for a gradual approach to battling inherited diseases: “A couple of years ago,” he told me, “we identified this gene called the fibulin 5. It’s responsible for 1.5 percent of age-related macular degeneration.” He made a tiny space between two fingers. “It’s this dinky little thing, right?” But the discovery, he said, gives researchers a look at the mechanism that causes macular degeneration. “This allows us then to do experiments that say, now why is that? Why is a tiny change in this gene causing people to get these accumulations under their retinas? . . . If we could understand that pathway,” he said, “maybe there are things we could do when somebody is 35 years old to knock that pathway down a little bit. Then instead of the average age of someone losing vision from macular degeneration being 67 or 71 or something, maybe it could be 87 or 91. We’d like it to be never. But from a population point of view, every three or four years that you could move that curve make a dramatic difference in the amount of blindness out there.”
171 Provided you fork over your data. Hospitals that figure out how to make intelligent use of patient data are bound to rise to the top. This, as I learned on a visit to the Mayo Clinic in Rochester, Minnesota, has long been the case. I met with the clinic’s data expert, Dr. Christopher Chute, who told me about a crucial breakthrough. In the first years of the clinic, more than a century ago, he told me, the Mayo brothers ran their operation much like other big clinics. Say a patient came in with a sore shoulder. He was sent to the orthopedic specialist. But it turned out to be a heart problem! So off he went to the coronary specialist. He took some medicine there and broke out in hives. Next stop, dermatologist. Each of these three doctors had a separate record of the patient. Often they had to track down their colleagues to piece together the twists and turns of their patients’ cases.
Enter the Mayo brothers’ partner, Henry Plummer. In 1907, he and his assistant, Mabel Root, devised a new system. Upon signing in, each patient received a dossier, to be carried from doctor to doctor. This way, each doctor could study the medical history of their patients from the first day they arrived at the clinic. When the patients checked out, their dossiers went into a big file. Plummer and Root put color codings on the dossiers for each type of disease and treatment. The result, said Chute, using language that sounds more Google than Mayo, “They had a paper database that was structured and searchable!” Through the years, they indexed the dossiers with ever finer detail. This enabled them to engage in what our generation would call analytics. They could look at every case of colon cancer or tonsillitis, and analyze which treatments were most effective and cost-efficient. “This was continuous quality improvement,” Chute said, referring to the industrial process Japanese automakers made famous decades later. They turned the practice of medicine from a boutique business of independent consultants into a modern business. “This place exploded out of the corn fields.” The challenge now, of course, is to come up with a similar breakthrough for medical data in the twenty-first century.
172 In Britain, Norwich Union offers. “Norwich Union Buys Tracking Equipment for Pay-as-You-Go Motor Insurance,” Insurance Business Review, Oct. 6, 2005.
7. LOVER
195 94 percent of U.S. corporations. “The Art of the Online Résumé,” BusinessWeek, May 7, 2007.
196 Software to record their movements and interactions. “Gadge
ts That Know Your Next Move,” Technology Review, Nov. 1, 2006.
CONCLUSION
215 “Garbage in, garbage out.” Not everyone agrees with the familiar thesis of garbage in, garbage out. Early in my research, I was talking about it with William Pulleyblank, IBM’s vice president in charge of business optimization and a former director of the company’s Deep Computing Institute. “Garbage in, garbage out isn’t correct anymore,” he said. “You haven’t got time to clean up your data. The real challenge is how you make something of value from ‘garbage.’” In other words, in a fast-moving business world, quick and dirty conclusions have a fighting chance to work. Slow and sure, by contrast, is often an oxymoron, because data may be out of date by the time it’s cleaned and vetted.
Sources and Further Reading
* * *
Ayres, Ian. Supercrunchers: Why Thinking-by-Numbers Is the New Way to Be Smart. Bantam, 2007
Barabasi, Albert-Laszlo. Linked. Plume/The Penguin Group, 2003
Bardi, Jason. Socrates: The Calculus Wars. Thunder’s Mount Press, 2006
Briggs, Rex, and Greg Stuart. What Sticks. Kaplan Publishing, 2006
Brin, David. The Transparent Society. Basic Books, 1998
Courant, Richard, and Herbert Robbins (revised by Ian Stewart). What Is Mathematics? Oxford University Press, 1996 (originally published in 1941)
Dantzig, Tobias. Number: The Language of Science. Fourth edition. The Free Press, 1967
Gleick, James. Isaac Newton. Vintage Books, 2003
Hamm, Steve. Bangalore Tiger. McGraw-Hill, 2007
Henshaw, John M. Does Measurement Measure Up? The Johns Hopkins University Press, 2006
Morville, Peter. Ambient Findability: What We Find Changes Who We Become. O’Reilly Media, 2005
O’Harrow, Robert Jr. No Place to Hide. Free Press, 2005
Schultz, Don E., Stanley I. Tannenbaum, and Robert F. Lauterborn. The New Marketing Paradigm: Integrated Marketing Communications. NTC Business Books, 1994
Sosnik, Douglas B., Matthew J. Dowd, and Ron Fournier. Applebee’s America. Simon & Schuster, 2006
Stakutis, Chris, and John Webster. Inescapable Data: Harnessing the Power of Convergence. IBM Press, 2005
Watts, Duncan J. Six Degrees: The Science of a Connected Age. Norton, 2003
Whitehead, Alfred North. Introduction to Mathematics. Barnes & Noble Books, 2005 (originally published in 1911)
Index
* * *
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
A
Accenture (company), 42–44, 46–49, 58–64, 70
Acxiom (company), 75–76
AdSense (Google service), 117
Advertisers, 202
calculating rate of return for, 94, 95
changes in methods of, 9–11, 14, 98, 100–116
customer lists shared by, 62, 76, 91–92, 207
and the Internet, 2–7, 15–16, 42, 98–122, 187, 194
microtargeting by, 91, 205, 224n
and retail stores, 47–66, 70, 125, 141, 183, 192
selling our own data to, 205
See also “Buckets”; Shoppers; “Tribes”
African American voters, 226n
Age (generations)
distinguishing, through word analysis, 100–101, 108, 111
on online dating questionnaire, 199–200
Alamo Rent A Car, 1–3, 15–16, 27, 57
Algorithms
for analyzing patients, 158, 160, 181
for analyzing shoppers, 57, 194–95
for analyzing terrorists, 126–27, 129, 145
for analyzing voters, 72, 86, 91
for biological analysis, 56, 160
dating services’ use of, 182, 184, 192, 194–95, 200, 205
defined, 31–32, 222–23n
as Numerati tool, 13, 14, 30–32, 39, 57–58, 205, 206–7
Alhazmi, Nawaf, 131–32
Allen, Paul, 159
AllianceTech (company), 65
Almihdhar, Khalid, 131–32
Al Qaeda, 124, 131, 142, 146
Alzheimer’s disease, 159, 161, 176, 177, 180
Amazon.com, 14, 42, 45, 61, 162
Andresen, Dan, 169–73, 198
ANNA (privacy software), 152
Anthropologists, 149–50, 154–55, 182, 184, 188
AOL (company), 15–16
Applebee’s America (Dowd), 68, 92, 225n
Arnold, Douglas, 64
Artificial intelligence, 107–8
See also Computers; Machine learning
ASWORG, 30–31
AttentionTrust (company), 205
B
Baker, Mary Jane and Walter (author’s parents), 72–74, 85, 154, 156, 174–76
Baker, Stephen, 182–86, 198–200
“Barnacle” shoppers, 51–53, 64
“Barn Raisers” tribe, 78, 80, 81, 83, 88, 93, 126
Bartering, 26
Baseball Prospective (website), 27
Baseball statistics, 27–28
BBN (company), 144
Bed sensors, 158, 161, 177
Behavior
altering, 13–14, 49–51
predicting, 3, 12–14, 44, 86–90, 116, 150, 159, 166, 173, 188, 196–98, 201–2, 207
proxies for, 70, 83–85, 90
tracking of cell phone users’, 195–97
tracking of elderly people’s, 154–81
tracking of Internet users’, 1–6, 17–19, 187, 188
tracking of terrorists’, 125–26
tracking shoppers’ patterns of, 41–66, 187
See also Data; Mathematical models
“Behavioral markers,” 160
Beltran, Carlos, 27–28, 40
Bin Laden, Osama, 124, 149
Biology and biologists, 7, 11, 14, 58, 142, 160, 167–68, 188–89, 201, 227n
See also DNA; Genetics
Black, Fischer, 21
The Black Swan (Taleb), 126
Bloggers, 96–122
Bluetooth data connections, 103–4
“Bootstrapper” tribe, 88–89
“Bootstrapping,” 164
Brands, 10, 47, 50
Brin, Sergey, 215
Britain, 143
“Buckets,” 50–55, 57, 59, 80–81, 105, 115, 128, 187, 205, 207
See also “Tribes”
“Builders” (personality type), 189–91
Bush, George W., 68, 91–92, 95, 114–15, 124, 131
BusinessWeek (magazine), 195, 221n
“Butterfly” shoppers, 53, 64
BuzzMetrics (company), 104, 121
C
Cameras (surveillance)
at Accenture, 44, 63–64
in casinos, 137–41
in homes of the elderly, 162–63, 166
in public places, 4, 43, 63–64, 143–44
See also Facial recognition; Photos; Surveillance
Capital IQ, 210, 211
Capital One, 224n
Carbonell, Jaime, 61
Carbon nanotube, 168
Carley, Kathleen, 35–37
Carnegie Mellon University (CMU), 13, 35–37, 45–46, 146–48, 211
Casablanca (movie), 141
Casinos, 133–41, 144
Cavaretta, Michael, 226n
Cell phones
Bluetooth technology for, 103–4
data produced by, 4, 5, 16, 195–99
technical issues associated with, 174
tracking use of, 35, 130, 195–99
Central Intelligence Agency. See CIA
Chávez, Hugo, 211
Chemistry.com, 182–93, 198, 199–200, 205
China, 33, 38, 143–45
ChoicePoint (company), 75
Chute, Christopher, 228–29n
CIA (Central Intelligence Agency), 124, 129, 135
“Civic Sentries” tribe, 82, 87, 93
Civil liberties. See Privacy
Clairvoyance Corp., 152
Clus
tering software, 61–62
CMU. See Carnegie Mellon University
Code-breaking, 128–30
Cold War, 129–30
Community, 70, 71, 74, 77, 79–80
Computers
and algorithms, 222–23n
on animals, 169–71, 174
brains compared to, 25
chips in, 4–5
cookies on, 2
cost of, 157
data produced using, 4–5
history of uses of, 7–11
speed of calculations by, 86–87, 112
teaching, to recognize “tribes,” 59–62
weaknesses of, 112–13
and workers, 17–40, 63–64, 97, 106
See also Algorithms; Computer scientists; Data; Internet; Machine learning; Mathematical models; Mathematicians; Privacy; RFID
Computer scientists
competition over hiring of, 145–46
as making sense of data, 6, 9, 35–37, 129
and math, 221n
myths about, 206–14
See also Computers; Numerati