The Best American Science and Nature Writing 2020

Page 34

by Michio Kaku

JOHN SEABROOK

The Next Word

from The New Yorker

I glanced down at my left thumb, still resting on the Tab key. What have I done? Had my computer become my cowriter? That’s one small step forward for artificial intelligence, but was it also one step backward for my own?

The skin prickled on the back of my neck, an involuntary reaction to what roboticists call the “uncanny valley”—the space between flesh and blood and a too-human machine.

For several days, I had been trying to ignore the suggestions made by Smart Compose, a feature that Google introduced, in May 2018, to the one and a half billion people who use Gmail—roughly a fifth of the human population. Smart Compose suggests endings to your sentences as you type them. Based on the words you’ve written, and on the words that millions of Gmail users followed those words with, “predictive text” guesses where your thoughts are likely to go and, to save you time, wraps up the sentence for you, appending the AI’s suggestion, in gray letters, to the words you’ve just produced. Hit Tab, and you’ve saved yourself as many as twenty keystrokes—and, in my case, composed a sentence with an AI for the first time.

Paul Lambert, who oversees Smart Compose for Google, told me that the idea for the product came in part from the writing of code—the language that software engineers use to program computers. Code contains long strings of identical sequences, so engineers rely on shortcuts, which they call “code completers.” Google thought that a similar technology could reduce the time spent writing emails for business users of its G Suite software, although it made the product available to the general public too. A quarter of the average office worker’s day is now taken up with email, according to a study by McKinsey. Smart Compose saves users altogether two billion keystrokes a week.

One can opt out of Smart Compose easily enough, but I had chosen not to, even though it frequently distracted me. I was fascinated by the way the AI seemed to know what I was going to write. Perhaps because writing is my vocation, I am inclined to consider my sentences, even in a humble email, in some way a personal expression of my original thought. It was therefore disconcerting how frequently the AI was able to accurately predict my intentions, often when I was in midsentence, or even earlier. Sometimes the machine seemed to have a better idea than I did.

And yet until now I’d always finished my thought by typing the sentence to a full stop, as though I were defending humanity’s exclusive right to writing, an ability unique to our species. I will gladly let Google predict the fastest route from Brooklyn to Boston, but if I allowed its algorithms to navigate to the end of my sentences how long would it be before the machine started thinking for me? I had remained on the near shore of a digital Rubicon, represented by the Tab key. On the far shore, I imagined, was a strange new land where machines do the writing, and people communicate in emojis, the modern version of the pictographs and hieroglyphs from which our writing system emerged, 5,000 years ago.

True, I had sampled Smart Reply, a sister technology of Smart Compose that offers a menu of three automated responses to a sender’s email, as suggested by its contents. “Got it!” I clicked, replying to detailed comments from my editor on an article I thought was finished. (I didn’t really get it, but that choice wasn’t on the menu.) I felt a little guilty right afterward, as though I’d replied with a form letter, or, worse, a fake personal note. A few days later, in response to a long email from me, I received a “Got it!” from the editor. Really?

Along with almost everyone else who texts or tweets, with the possible exception of the president of the United States, I have long relied on spell-checkers and auto-correctors, which are limited applications of predictive text. I’m awful at spelling, as was my father; the inability to spell has a genetic link, according to multiple studies. Before spell-checkers, I used spelling rules I learned in elementary school (“‘I’ before ‘E’ except after ‘C,’” but with fearful exceptions) and folksy mnemonics (“‘cemetery’: all at ‘E’s”). Now that spell-checkers are ubiquitous in word-processing software, I’ve stopped even trying to spell anymore—I just get close enough to let the machine guess the word I’m struggling to form. Occasionally, I stump the AI.

But Smart Compose goes well beyond spell-checking. It isn’t correcting words I’ve already formed in my head; it’s coming up with them for me, by harnessing the predictive power of deep learning, a subset of machine learning. Machine learning is the sophisticated method of computing probabilities in large data sets, and it underlies virtually all the extraordinary AI advances of recent years, including those in navigation, image recognition, search, game playing, and autonomous vehicles. In this case, it’s making billions of lightning-fast probability calculations about word patterns from a year’s worth of emails sent from Gmail.com. (It does not include emails sent by G Suite customers.)

“At any point in what you’re writing, we have a guess about what the next x number of words will be,” Lambert explained. To do that, the AI factors a number of different probability calculations into the “state” of the email you’re in the middle of writing. “The state is informed by a number of things,” Lambert went on, “including everything you have written in that email up until now, so every time you insert a new word the system updates the state and reprocesses the whole thing.” The day of the week you’re writing the email is one of the things that inform the state. “So,” he said, “if you write ‘Have a’ on a Friday, it’s much more likely to predict ‘good weekend’ than if it’s on a Tuesday.”

Although Smart Compose generally limits itself to predicting the next phrase or two, the AI could ramble on longer. The trade-off, Lambert noted, is accuracy. “The farther out from the original text we go, the less accurate the prediction.”

Finally, I crossed my Rubicon. The sentence itself was a pedestrian affair. Typing an email to my son, I began “I am p—” and was about to write “pleased” when predictive text suggested “proud of you.” I am proud of you. Wow, I don’t say that enough. And clearly Smart Compose thinks that’s what most fathers in my state say to their sons in emails. I hit Tab. No biggie.

And yet, sitting there at the keyboard, I could feel the uncanny valley prickling my neck. It wasn’t that Smart Compose had guessed correctly where my thoughts were headed—in fact, it hadn’t. The creepy thing was that the machine was more thoughtful than I was.

* * *

In February, OpenAI, an artificial-intelligence company, announced that the release of the full version of its AI writer, called GPT-2—a kind of supercharged version of Smart Compose—would be delayed, because the machine was too good at writing. The announcement struck critics as a grandiose publicity stunt (on Twitter, the insults flew), but it was in keeping with the company’s somewhat paradoxical mission, which is both to advance research in artificial intelligence as rapidly as possible and to prepare for the potential threat posed by superintelligent machines that haven’t been taught to “love humanity,” as Greg Brockman, OpenAI’s chief technology officer, put it to me.

OpenAI began in 2015, as a nonprofit founded by Brockman, formerly the CTO of the payment startup Stripe; Elon Musk, of Tesla; Sam Altman, of Y Combinator; and Ilya Sutskever, who left Google Brain to become OpenAI’s chief scientist. The tech tycoons Peter Thiel and Reid Hoffman, among others, provided seed money. The founders’ idea was to endow a nonprofit with the expertise and the resources to be competitive with private enterprise, while at the same time making its discoveries available as open source—so long as it was safe to do so—thus potentially heading off a situation where a few corporations reap the almost immeasurable rewards of a vast new world. As Brockman told me, a superintelligent machine would be of such immense value, with so much wealth accruing to any company that owned one, that it could “break capitalism” and potentially realign the world order. “We want to ensure its benefits are distributed as widely as possible,” Brockman said.

OpenAI’s projects to
date include a gaming AI that earlier this year beat the world’s best human team at Dota 2, a multiplayer online strategy game. Open-world computer games offer AI designers almost infinite strategic possibilities, making them valuable testing grounds. The AI had mastered Dota 2 by playing its way through tens of thousands of years’ worth of possible scenarios a gamer might encounter, learning how to win through trial and error. The company also developed the software for a robotic hand that can teach itself to manipulate objects of different shapes and sizes without any human programming. (Traditional robotic appendages used in factories can execute only hard-coded moves.) GPT-2, like these other projects, was designed to advance technology—in this case, to push forward the development of a machine designed to write prose as well as, or better than, most people can.

Although OpenAI says that it remains committed to sharing the benefits of its research, it became a limited partnership in March, to attract investors, so that the company has the financial resources to keep up with the exponential growth in “compute”—the fuel powering the neural networks that underpin deep learning. These “neural nets” are made of what are, essentially, dimmer switches that are networked together, so that, like the neurons in our brains, they can excite one another when they are stimulated. In the brain, the stimulation is a small amount of electrical current; in machines, it’s streams of data. Training neural nets the size of GPT-2’s is expensive, in part because of the energy costs incurred in running and cooling the sprawling terrestrial “server farms” that power the cloud. A group of researchers at UMass Amherst, led by Emma Strubell, conducted a recent study showing that the carbon footprint created by training a gigantic neural net is roughly equal to the lifetime emissions of five automobiles.

OpenAI says it will need to invest billions of dollars in the coming years. The compute is growing even faster than the rate suggested by Moore’s Law, which holds that the processing power of computers doubles every two years. Innovations in chip design, network architecture, and cloud-based resources are making the total available compute ten times larger each year—as of 2018, it was 300,000 times larger than it was in 2012.

As a result, neural nets can do all sorts of things that futurists have long predicted for computers but couldn’t execute until recently. Machine translation, an enduring dream of AI researchers, was, until three years ago, too error-prone to do much more than approximate the meaning of words in another language. Since switching to neural machine translation, in 2016, Google Translate has begun to replace human translators in certain domains, like medicine. A recent study published in Annals of Internal Medicine found Google Translate accurate enough to rely on in translating non-English medical studies into English for the systematic reviews that health care decisions are based on.

Ilya Sutskever, OpenAI’s chief scientist, is, at thirty-three, one of the most highly regarded of the younger researchers in AI. When we met, he was wearing a T-shirt that said THE FUTURE WILL NOT BE SUPERVISED. Supervised learning, which used to be the way neural nets were trained, involved labeling the training data—a labor-intensive process. In unsupervised learning, no labeling is required, which makes the method scalable. Instead of learning to identify cats from pictures labeled “cat,” for example, the machine learns to recognize feline pixel patterns, through trial and error.

Sutskever told me, of GPT-2, “Give it the compute, give it the data, and it will do amazing things,” his eyes wide with wonder, when I met him and Brockman at their company’s San Francisco headquarters this summer. “This stuff is like—” Sutskever paused, searching for the right word. “It’s like alchemy!”

It was startling to hear a computer scientist on the leading edge of AI research compare his work to a medieval practice performed by men who were as much magicians as scientists. Didn’t alchemy end with the Enlightenment?

GPT-2 runs on a neural net that is ten times larger than OpenAI’s first language model, GPT (short for Generative Pretrained Transformer). After the announcement that OpenAI was delaying a full release, it made three less powerful versions available on the Web—one in February, the second in May, and the third in August. Dario Amodei, a computational neuroscientist who is the company’s director of research, explained to me the reason for withholding the full version: “Until now, if you saw a piece of writing, it was like a certificate that a human was involved in it. Now it is no longer a certificate that an actual human is involved.”

That sounded something like my Rubicon moment with my son. What part of “I am proud of you” was human—intimate father-son stuff—and what part of it was machine-generated text? It will become harder and harder to tell the difference.

* * *

Scientists have varying ideas about how we acquire spoken language. Many favor an evolutionary, biological basis for our verbal skills over the view that we are tabulae rasae, but all agree that we learn language largely from listening. Writing is certainly a learned skill, not an instinct—if anything, as years of professional experience have taught me, the instinct is to scan Twitter, vacuum, complete the Times crossword, or do practically anything else to avoid having to write. Unlike writing, speech doesn’t require multiple drafts before it “works.” Uncertainty, anxiety, dread, and mental fatigue all attend writing; talking, on the other hand, is easy, often pleasant, and feels mostly unconscious.

A recent exhibition on the written word at the British Library dates the emergence of cuneiform writing to the fourth millennium BCE, in Mesopotamia. Trade had become too complex for people to remember all the contractual details, so they began to put contracts in writing. In the millennia that followed, literary craft evolved into much more than an enhanced form of accounting. Socrates, who famously disapproved of literary production for its deleterious (thank you, spell-checker) effect on memory, called writing “visible speech”—we know that because his student Plato wrote it down after the master’s death. A more contemporary definition, developed by the linguist Linda Flower and the psychologist John Hayes, is “cognitive rhetoric”—thinking in words.

In 1981, Flower and Hayes devised a theoretical model for the brain as it is engaged in writing, which they called the cognitive-process theory. It has endured as the paradigm of literary composition for almost forty years. The previous, “stage model” theory had posited that there were three distinct stages involved in writing—planning, composing, and revising—and that a writer moved through each in order. To test that theory, the researchers asked people to speak aloud any stray thoughts that popped into their heads while they were in the composing phase, and recorded the hilariously chaotic results. They concluded that, far from being a stately progression through distinct stages, writing is a much messier situation, in which all three stages interact with one another simultaneously, loosely overseen by a mental entity that Flower and Hayes called “the monitor.” Insights derived from the work of composing continually undermine assumptions made in the planning part, requiring more research; the monitor is a kind of triage doctor in an emergency room.

There is little hard science on the physiological state in the brain while writing is taking place. For one thing, it’s difficult to write inside an MRI machine, where the brain’s neural circuitry can be observed in action as the imaging traces blood flow. Historically, scientists have believed that there are two parts of the brain involved in language processing: one decodes the inputs, and the other generates the outputs. According to this classic model, words are formed in Broca’s area, named for the French physician Pierre Paul Broca, who discovered the region’s language function, in the mid-nineteenth century; in most people, it’s situated toward the front of the left hemisphere of the brain. Language is understood in Wernicke’s area, named for the German neurologist Carl Wernicke, who published his research later in the nineteenth century. Both men, working long before CAT scans allowed neurologists to see inside the skull, made their conclusions after examining lesions in the autopsied brains of a
phasia sufferers, who (in Broca’s case) had lost their speech but could still understand words or (in Wernicke’s) had lost the ability to comprehend language but could still speak. Connecting Broca’s area with Wernicke’s is a neural network: a thick, curving bundle of billions of nerve fibers, the arcuate fasciculus, which integrates the production and the comprehension of language.

In recent years, neuroscientists using imaging technology have begun to rethink some of the underlying principles of the classic model. One of the few imaging studies to focus specifically on writing, rather than on language use in general, was led by the neuroscientist Martin Lotze, at the University of Greifswald, in Germany, and the findings were published in the journal NeuroImage, in 2014. Lotze designed a small desk where the study’s subjects could write by hand while he scanned their brains. The subjects were given a few sentences from a short story to copy verbatim, in order to establish a baseline, and were then told to “brainstorm” for sixty seconds and then to continue writing “creatively” for two more minutes. Lotze noted that, during the brainstorming part of the test, magnetic imaging showed that the sensorimotor and visual areas were activated; once creative writing started, these areas were joined by the bilateral dorsolateral prefrontal cortex, the left inferior frontal gyrus, the left thalamus, and the inferior temporal gyrus. In short, writing seems to be a whole-brain activity—a brainstorm indeed.

Lotze also compared brain scans of amateur writers with those of people who pursue writing as a career. He found that professional writers relied on a region of the brain that did not light up as much in the scanner when amateurs wrote—the left caudate nucleus, a tadpole-shaped structure (cauda means “tail” in Latin) in the midbrain that is associated with expertise in musicians and professional athletes. In amateur writers, neurons fired in the lateral occipital areas, which are associated with visual processing. Writing well, one could conclude, is, like playing the piano or dribbling a basketball, mostly a matter of doing it. Practice is the only path to mastery.

‹ Prev Next ›