Book Read Free

The Best American Science and Nature Writing 2020

Page 36

by Michio Kaku


  * * *

  OpenAI occupies a historic three-story loft building, originally built as a luggage factory in 1903, three years before the earthquake and fire that consumed much of San Francisco. It sits at the corner of Eighteenth and Folsom Streets, in the city’s Mission District. There are a hundred employees, most of them young and well educated, who have an air of higher purpose about them. The staff aren’t merely trying to invent a superintelligent machine. They’re also devoted to protecting us from superintelligence, by trying to formulate safety standards for the technology which are akin to the international protocols that govern nuclear materials like yellowcake uranium. What might be the safest course of all—​to stop trying to build a machine as intelligent as we are—​isn’t part of OpenAI’s business plan.

  Dario Amodei, the research director, conducted the demonstration of the New Yorker–trained AI for me, in a glass-walled conference room on the first floor, using an OpenAI laptop. Amodei, thirty-six, has a PhD in computational neuroscience from Princeton and did a postdoc at Stanford. He has boyishly curly hair that he has the habit of twisting around a finger while he talks.

  In fine-tuning GPT-2 for the purposes of this article, the neural net categorized distinctive aspects of New Yorker prose—​the words its writers tended to favor, the magazine’s rhythms, its distinctive style of narrative rhetoric, its voice—​and the learning algorithm used these data to automatically adjust the neural net’s settings, so that its predictions leaned toward New Yorker locutions. We were about to find out how well it worked. I had butterflies. It felt as if we were lighting a fuse but didn’t know where it led.

  The interface on the laptop screen was deceptively simple: a window where you could paste or write in prompts, and four slider controls on the left. Two adjusted the output: how many words the machine wrote each time the user pressed the refresh button. A third was for “generativity”—​establishing how jiggy the AI got with its suggestions. The fourth slider adjusted the “nucleus sampling”—​the size of the pool of words from which the machine drew in generating text. The refresh button was OpenAI’s logo, which looked to me like a mandala, the kind associated with alchemists in the Middle Ages.

  I had selected in advance some openings of favorite New Yorker articles to prompt the machine. We started by pasting the first paragraph of Lillian Ross’s 1950 profile of Ernest Hemingway into the prompt window. It begins:

  Ernest Hemingway, who may well be the greatest living American novelist and short-story writer, rarely comes to New York. He spends most of his time on a farm, the Finca Vigia, nine miles outside Havana, with his wife, a domestic staff of nine, fifty-two cats, sixteen dogs, a couple of hundred pigeons, and three cows.

  Amodei allowed me to click the mandala. After a second-long pause, the generator kicked in, and line after line of Robo-Ross’s prose began to fill the screen:

  I walked up the path to the fence and saw a plump dog, that had been a common visitor to the Finca Vigia before the war, galloping up a path to the main building, with a tiny cow of the same name standing by her side. There was a puddle of red gravy in the front yard, and Hemingway sat down in a lawn chair.

  On first reading this passage, my brain ignored what AI researchers call “world-modeling failures”—​the tiny cow and the puddle of red gravy. Because I had never encountered a prose-writing machine even remotely this fluent before, my brain made an assumption—​any human capable of writing this well would know that cows aren’t tiny and red gravy doesn’t puddle in people’s yards. And because GPT-2 was an inspired mimic, expertly capturing The New Yorker’s cadences and narrative rhythms, it sounded like a familiar, trusted voice that I was inclined to believe. In fact, it sounded sort of like my voice.

  I recalled a well-known experiment conducted in 1966 by Joseph Weizenbaum, a German-born professor at MIT who was a pioneer of artificial intelligence. In the experiment, a primitive (by today’s standards) chatbot that Weizenbaum named ELIZA—​for the George Bernard Shaw ingénue—​responded, in writing, to statements by the study’s subjects. The bot was programmed to answer in the style of a stereotypical psychotherapist, with questions such as “How does that make you feel?” To Weizenbaum’s surprise, the “patients,” even when they knew ELIZA was a bot, began revealing intimate details of their lives; his secretary at MIT asked him to leave the room so that she could communicate freely with ELIZA.

  I clicked the mandala again, and the machine continued writing its Daliesque version of Ross’s profile, using, in addition to the first prompt, the prose it had already generated to generate from:

  He was wearing a tweed suit, over a shiny sweater, and his black hair was brushed back. He had a red beard and wore his waistcoat in an overcoat with the body of a ship, three broad belts of colorful chain-link, a pair of capacious rectangular eyeglasses, and a silk tie. “Gouging my eye,” he said, in Italian, saying that he had caused himself that terrible scar, “the surgeon said it wasn’t that bad.” When he was very young, he said, he started smoking but didn’t find it very pleasant. The cigarette burns in his hands and wrists were so bad that he had to have his face covered.

  Three chain-link belts? Oddly, a belt does come up later in Ross’s article, when she and Hemingway go shopping. So do eyeglasses, and cigarettes, and Italy. GPT-2 hadn’t “read” the article—​it wasn’t included in the training data—​yet it had somehow alighted on evocative details. Its deep learning obviously did not include the ability to distinguish nonfiction from fiction, though. Convincingly faking quotes was one of its singular talents. Other things often sounded right, though GPT-2 suffered frequent world-modeling failures—​gaps in the kind of commonsense knowledge that tells you overcoats aren’t shaped like the body of a ship. It was as though the writer had fallen asleep and was dreaming.

  Amodei explained that there was no way of knowing why the AI came up with specific names and descriptions in its writing; it was drawing from a content pool that seemed to be a mixture of New Yorker–ese and the machine’s Reddit-based training. The mathematical calculations that resulted in the algorithmic settings that yielded GPT-2’s words are far too complex for our brains to understand. In trying to build a thinking machine, scientists have so far succeeded only in reiterating the mystery of how our own brains think.

  Because of the size of the Reddit data set necessary to train GPT-2, it is impossible for researchers to filter out all the abusive or racist content, although OpenAI had caught some of it. However, Amodei added, “it’s definitely the case, if you start saying things about conspiracy theories, or prompting it from the Stormfront website—​it knows about that.” Conspiracy theories, after all, are a form of pattern recognition too; the AI doesn’t care if they’re true or not.

  Each time I clicked the refresh button, the prose that the machine generated became more random; after three or four tries, the writing had drifted far from the original prompt. I found that by adjusting the slider to limit the amount of text GPT-2 generated, and then generating again so that it used the language it had just produced, the writing stayed on topic a bit longer, but it, too, soon devolved into gibberish, in a way that reminded me of HAL, the superintelligent computer in 2001: A Space Odyssey, when the astronauts begin to disconnect its mainframe-size artificial brain.

  An hour or so later, after we had tried opening paragraphs of John Hersey’s Hiroshima and Truman Capote’s In Cold Blood, my initial excitement had curdled into queasiness. It hurt to see the rules of grammar and usage, which I have lived my writing life by, mastered by an idiot savant that used math for words. It was sickening to see how the slithering machine intelligence, with its ability to take on the color of the prompt’s prose, slipped into some of my favorite paragraphs, impersonating their voices but without their souls.

  * * *

  There are many positive services that AI writers might provide. IBM recently debuted an AI called Speech by Crowd, which it has been developing with Noam Slonim, an Israeli IBM Research Fel
low. The AI processed almost 2,000 essays written by people on the topic “Social Media Brings More Harm Than Good” and, using a combination of rules and deep learning, isolated the best arguments on both sides and summarized them in a pair of three- to five-paragraph, op-ed-style essays, one pro (“Social media creates a platform to support freedom of speech, giving individuals a platform to voice their opinions and interact with like-minded individuals”) and one con (“The opinion of a few can now determine the debate, it causes polarized discussions and strong feelings on non-important subjects”). The essays I read were competent, but most seventh-graders with social-media experience could have made the same arguments less formulaically.

  Slonim pointed to the rigid formats used in public-opinion surveys, which rely on questions the pollsters think are important. What, he asked, if these surveys came with open-ended questions that allowed respondents to write about issues that concern them, in any form. Speech by Crowd can “read” all the answers and digest them into broader narratives. “That would disrupt opinion surveys,” Slonim told me.

  At Narrative Science, in Chicago, a company cofounded by Kristian Hammond, a computer scientist at Northwestern, the main focus is using a suite of artificial-intelligence techniques to turn data into natural language and narrative. The company’s software renders numerical information about profit and loss or manufacturing operations, for example, as stories that make sense of patterns in the data, a tedious task formerly accomplished by people poring over numbers and churning out reports. “I have data, and I don’t understand the data, and so a system figures out what I need to hear and then turns it into language,” Hammond explained. “I’m stunned by how much data we have and how little of it we use. For me, it’s trying to build that bridge between data and information.”

  One of Hammond’s former colleagues, Jeremy Gilbert, now the director of strategic initiatives at the Washington Post, oversees Heliograf, the Post’s deep-learning robotic newshound. Its purpose, he told me, is not to replace journalists but to cover data-heavy stories, some with small but highly engaged audiences—​a high school football game (“The Yorktown Patriots triumphed over the visiting Wilson Tigers in a close game on Thursday, 20–14,” the AI reported), local election results, a minor commodities-market report—​that newspapers lack the manpower to cover, and others with much broader reach, such as national elections or the Olympics. Heliograf collects the data and applies them to a particular template—​a spreadsheet for words, Gilbert said—​and an algorithm identifies the decisive play in the game or the key issue in the election and generates the language to describe it. Although Gilbert says that no freelancer has lost a gig to Heliograf, it’s not hard to imagine that the high school stringer who once started out on the varsity beat will be coding instead.

  * * *

  OpenAI made it possible for me to log in to the New Yorker AI remotely. On the flight back to New York, I put some of my notes from the OpenAI visit into GPT-2 and it began making up quotes for Ilya Sutskever, the company’s chief scientist. The machine appeared to be well informed about his groundbreaking research. I worried that I’d forget what he really said, because the AI sounded so much like him, and that I’d inadvertently use in my article the machine’s fake reporting, generated from my notes. (“We can make fast translations but we can’t really solve these conceptual questions,” one of GPT-2’s Sutskever quotes said. “Maybe it is better to have one person go out and learn French than to have an entire computer-science department.”) By the time I got home, the AI had me spooked. I knew right away there was no way the machine could help me write this article, but I suspected that there were a million ways it could screw me up.

  I sent a sample of GPT-2’s prose to Steven Pinker, the Harvard psycholinguist. He was not impressed with the machine’s “superficially plausible gobbledygook,” and explained why. I put some of his reply into the generator window, clicked the mandala, added synthetic Pinker prose to the real thing, and asked people to guess where the author of The Language Instinct stopped and the machine took over.

  Being amnesic for how it began a phrase or sentence, it won’t consistently complete it with the necessary agreement and concord—​to say nothing of semantic coherence. And this reveals the second problem: real language does not consist of a running monologue that sounds sort of like English. It’s a way of expressing ideas, a mapping from meaning to sound or text. To put it crudely, speaking or writing is a box whose input is a meaning plus a communicative intent, and whose output is a string of words; comprehension is a box with the opposite information flow. What is essentially wrong with this perspective is that it assumes that meaning and intent are inextricably linked. Their separation, the learning scientist Phil Zuckerman has argued, is an illusion that we have built into our brains, a false sense of coherence.

  That’s Pinker through “information flow.” (There is no learning scientist named Phil Zuckerman, although there is a sociologist by that name who specializes in secularity.) Pinker is right about the machine’s amnesic qualities—​it can’t develop a thought, based on a previous one. It’s like a person who speaks constantly but says almost nothing. (Political punditry could be its natural domain.) However, almost everyone I tried the Pinker Test on, including Dario Amodei, of OpenAI, and Les Perelman, of Project BABEL, failed to distinguish Pinker’s prose from the machine’s gobbledygook. The AI had them Pinkered.

  GPT-2 was like a three-year-old prodigiously gifted with the illusion, at least, of college-level writing ability. But even a child prodigy would have a goal in writing; the machine’s only goal is to predict the next word. It can’t sustain a thought, because it can’t think causally. Deep learning works brilliantly at capturing all the edgy patterns in our syntactic gymnastics, but because it lacks a precoded base of procedural knowledge it can’t use its language skills to reason or to conceptualize. An intelligent machine needs both kinds of thinking.

  “It’s a card trick,” Kris Hammond, of Narrative Science, said, when I sent him what I thought were some of the GPT-2’s better efforts. “A very sophisticated card trick, but at heart it’s still a card trick.” True, but there are also a lot of tricks involved in writing, so it’s hard to find fault with a fellow-mountebank on that score.

  One can envision machines like GPT-2 spewing superficially sensible gibberish, like a burst water main of babble, flooding the internet with so much writing that it would soon drown out human voices, and then training on its own meaningless prose, like a cow chewing its cud. But composing a long discursive narrative, structured in a particular way to advance the story, was, at least for now, completely beyond GPT-2’s predictive capacity.

  However, even if people will still be necessary for literary production, day by day, automated writers like GPT-2 will do a little more of the writing that humans are now required to do. People who aren’t professional writers may be able to avail themselves of a wide range of products that will write emails, memos, reports, and speeches for them. And, like me writing “I am proud of you” to my son, some of the AI’s next words might seem superior to words you might have thought of yourself. But what else might you have thought to say that is not computable? That will all be lost.

  * * *

  Before my visit to OpenAI, I watched a lecture on YouTube that Ilya Sutskever had given on GPT‑2 in March, at the Computer History Museum, in Mountain View, California. In it, he made what sounded to me like a claim that GPT-2 itself might venture, if you set the generativity slider to the max. Sutskever said, “If a machine like GPT-2 could have enough data and computing power to perfectly predict the next word, that would be the equivalent of understanding.”

  At OpenAI, I asked Sutskever about this. “When I said this statement, I used ‘understanding’ informally,” he explained. “We don’t really know what it means for a system to understand something, and when you look at a system like this it can be genuinely hard to tell. The thing that I meant was: If you train a system which predicts th
e next word well enough, then it ought to understand. If it doesn’t predict it well enough, its understanding will be incomplete.”

  However, Sutskever added, “researchers can’t disallow the possibility that we will reach understanding when the neural net gets as big as the brain.”

  The brain is estimated to contain a hundred billion neurons, with trillions of connections between them. The neural net that the full version of GPT-2 runs on has about one and a half billion connections, or “parameters.” At the current rate at which compute is growing, neural nets could equal the brain’s raw processing capacity in five years. To help OpenAI get there first, Microsoft announced in July that it was investing $1 billion in the company, as part of an “exclusive computing partnership.” How its benefits will be “distributed as widely as possible” remains to be seen. (A spokesperson for OpenAI said that “Microsoft’s investment doesn’t give Microsoft control” over the AI that OpenAI creates.)

  David Ferrucci, the only person I tried the Pinker Test on who passed it, said, “Are we going to achieve machine understanding in a way we have hoped for many years? Not with these machine-learning techniques. Can we do it with hybrid techniques?” (By that he meant ones that combine knowledge-based systems with machine-learning pattern recognition.) “I’m betting yes. That’s what cognition is all about, a hybrid architecture that combines different classes of thinking.”

 

‹ Prev