But there is a potential cost to computers that, in their emotional awareness, can masquerade as humans. The proliferation of computer “bots” on Twitter and Facebook helped to drive the Brexit vote and Trump’s election as president—and it is for this reason that one tech critic called emotional voice simulation a “Pandora’s box”—yet another way that political operatives, and other mischief makers, will be able to construct false realities undetectable to our human sensory apparatus. Schuller says that he began hearing such criticism after the electoral shocks of 2016. “For the first time ever as a researcher,” he told me, “I’ve been getting negative feedback. In the past, people were always, like, ‘Oh wow, that’s cool!’ ”
Picard has never been blind to the potential dangers of affective computing. In her seminal 1995 paper, she cited as a warning Stanley Kubrick’s 2001: A Space Odyssey, in which HAL, the super-intelligent computer that runs the Jupiter-bound Discovery spaceship, suffers an emotional meltdown, triggered by fears that the crew members plan to shut “him” down. HAL kills all but one of the astronauts. “The message is serious,” Picard wrote: “a computer that can express itself emotionally will someday act emotionally”—a result, she says, of its “mimicking both cortical and limbic functions.”32 That computers control “the phone system, stock markets, nuclear power plants, and jet landings” (to say nothing of thermonuclear launch codes) is, Picard concedes, cause for concern. Nevertheless, she, like Schuller, believes that the benefits of emotional computing33 will outweigh any potential risks—even if those risks include, as Picard puts it, that “we, the maker, may be eliminated by our creation.”34 Whether this is an unreasonable fear is, of course, impossible to say at this point. What is not in doubt, however, is that much of the most cutting-edge work in imbuing computers with emotion starts with machine mastery of our infinitely subtle, affectively expressive, prosodically nuanced, paralinguistically rich human voice.
FOUR LANGUAGE
For now, anyway, we remain the only entities, animal or machine, capable of blending emotion and language in a single vocal sound wave. How we developed this remarkable and, so far, inimitable talent is unknown. The brain, lungs, larynx, tongue, and lips do not fossilize, so we can’t sift through the remains of extinct species for clues to how the cries and calls of our animal ancestors became speech, and the sounds themselves have, of course, long since fled from the air. Which is why the riddle of language origins has been called the “hardest problem in science.”1 It has nonetheless obsessed humanity since antiquity, given rise to some of the bloodiest and most fascinating intellectual battles ever fought, and provided remarkable insight into who we are and how we came to be.
Modern theories of how speech evolved date to the Enlightenment, when Johann Gottfried Herder, a student of Immanuel Kant, published his Treatise on the Origin of Language (1772) and asserted that words began as an onomatopoeia-like imitation of natural sounds using the voice: bleating of sheep, cooing of turtledoves, barking of dogs, rustling of leaves, rippling of water, whooshing of wind. Vocal mimicry of these sounds formed a pre-language capability, a “proto-language”—the crucial leap from howls and shrieks and chirps to articulated sounds that refer to things in the world (the thing my parakeet, and no other nonhuman animal, can do). Actual language, with fully formed words and syntax, gradually developed from there. Herder’s insights are all the more remarkable considering that he advanced them a century before Darwin informed us that we once existed in a more primitive state from which our intellectual faculties, including language, slowly evolved.
Other Enlightenment thinkers offered competing theories—including the French philosopher Étienne Bonnot de Condillac, who said that words developed not from imitating sounds in the outside world, but from impulses within: interjections, like the Arrgh! we shout upon hitting our thumb, or the Oooo we murmur at a pleasurable touch.2 (Herder, predictably, said this was insane.) Still others said language arose from the grunting that accompanied hand gestures, or the noises of sympathy we use to encourage or comfort others. The debate over the origins of speech grew so vicious and childish over the next hundred years that a leading research institute, the Société de Linguistique de Paris, in 1866 banned all further discussion of the topic.
A major catalyst for this prohibition was a famous address by the era’s leading linguist, Oxford University professor Max Müller, who, in 1861, delivered a series of lectures that attacked all existing theories of language’s origins. He ridiculed Herder’s ideas about onomatopoeic mimicry, calling it the “Bow-wow theory,” and dismissed Condillac’s ideas as the “Pooh-pooh theory.”3 Müller had a hidden agenda in these attacks: a devout Christian, he adhered to the Bible’s explanation that language began when God, as recounted in the book of Genesis, invited Adam to name all the animals.4 As such, Herder and Condillac were mere collateral damage in Müller’s campaign against a far larger target: Darwin, whose Origin of Species had been published eighteen months earlier.
Müller correctly viewed the Origin as the single greatest threat to religion ever disseminated, but he also believed that he had spied the book’s Achilles’ heel. Darwin had deliberately not discussed human evolution, leaving it up to readers to infer how our exceptional mental powers, the most conspicuous of which is language, arose. Even if natural selection could account for the complexity that creates an eye or hand or lung, Müller said, it could never explain something as intricate as speech. “Language is our Rubicon,” as he put it, “and no brute will dare to cross it… no process of natural selection will ever distill significant words out of the notes of birds or the cries of beasts.”5 In short, Müller became the standard-bearer for the view that speech is entirely discontinuous from animal voices.
Müller saw his argument taken up by creationists and other opponents of Darwin, including (amazingly) the co-discoverer of natural selection, Alfred Russel Wallace, who, having veered into spiritualism later in life, declared that, when it came to language, some mysterious “higher intelligence” had to have intervened.6 All of this meant that, by the time Darwin finally sat down to address human origins in his second major book on evolution, The Descent of Man, which he began writing soon after Müller threw down his gauntlet, the question of language origins loomed large—indeed, as the single most significant riddle Darwin had to solve.
* * *
In his efforts to convince readers that articulate humans descended from hooting chimpanzees, Darwin could not even offer concrete evidence of our ancestral descent from simians—since no fossil remains of linking species between apes and humans had yet been discovered. After Darwin’s death (in 1882), scores of such fossils were found: skeletal remains of nearly two dozen hominid and hominin species, tracing a clear line of descent, over fourteen million years, from the apes of eastern Africa to Homo sapiens.
Lacking that evidence, Darwin was obliged to use the comparative method, which meant pointing out the telling similarities, and differences, between living apes and us. He argued that the transition from simian knuckle walking to our upright posture freed the hands for tool making and other digital manipulations, which led, through positive neural feedback, to our enlarged brain and increased mental powers.7 At the same time, Darwin was at pains to stress how our mental faculties differ only in degree, not in kind, from other animals, who also possess “higher mental powers,” including memories, reasoning, tool use, even “architecture and dress” (in the shelters baboons build and the vegetation they drape on their bodies when cold). In thus arguing for how surprisingly small is the mental gap between nonspeaking animals and articulate humans, Darwin sought to reduce Müller’s “unbridgeable” Rubicon River to a mere trickle. He was then ready, in a short chapter entitled “Language,” to explain how our ape-human ancestors traversed that insignificant barrier—and started speaking.
Darwin begins by conceding that language “is peculiar to man,” although not speech. Parrots, he reminds us, talk—although not to convey complex meaning. On
ly humans possess a “large power of connecting definite sounds with definite ideas”—a result of our unprecedentedly large, powerful brain. But intelligence is only part of the story, he said. Language is, like animal sounds, “instinctual,” as is clear from the automatic babbling of babies. Still, speech cannot be a “true instinct” because “every language has to be learned” through exposure to adult speech. In this, Darwin said, we are just like birds—one of the only other species who acquire their specific vocalizations from listening to their parents. (Dogs, cats, horses, chimps, indeed every mammal, are born with their vocal signals hardwired; they will make their species-specific noises even if they’ve never heard such sounds.) Furthermore, birds, during the vocal learning stage, take on the local “provincial dialect” of the surrounding adult birds—just as human babies, displaced from their birth location, acquire the language, and regional accent, of their adoptive home.
These uncanny correspondences between birdsong and human speech led Darwin to a highly original insight. Whereas all earlier theorists imagined words coming first, Darwin said that the melody and rhythm of speech, its birdsong-like pitch sequences across sentences—its emotional prosody—preceded words in some now extinct singing ape-human. He cited the living example of gibbons, a species of singing ape whose operatic mating calls are “true musical cadences… serving to express various emotions, as love, jealousy, triumph, and serving as a challenge to… rivals.”8 From this complex vocal music, he mused, speech gradually emerged. Which explains why Darwin was so surprised, several years after writing The Descent of Man, to discover, among his papers, the forgotten “baby diary” of his son William, which documented an identical sequence of verbal development: musical voicings like the upward pitch rise on the nonsense syllable mum, a melodic vocalization that gave way, eventually, to the articulated request for “food”; it seemed that William’s speech had developed precisely as Darwin imagined speech developing in our species—emotionally expressive musical phrases giving way, in some slightly more mentally advanced ape-human, to a controlled movement of the articulators and the first wordlike sounds of a protolanguage.
In describing this protolanguage’s emergence, Darwin imagined how our distant ancestor might have mimicked other species’ voices: “It does not appear altogether incredible that some unusually wise ape-like animal should have thought of imitating the growl of a beast of prey, so as to indicate to his fellow monkeys the nature of the expected danger.”9 Darwin also embraced, without apology, the explanations (ridiculed by Müller) that were offered by Enlightenment thinkers of a century earlier. “I cannot doubt,” he wrote, “that language owes its origin to the imitation and modification, aided by signs and gestures, of various natural sounds, the voices of other animals, and man’s own instinctive cries.”10
Once this protolanguage was in place, habitual use of the voice “strengthened and perfected” the vocal organs, even as it “reacted on the mind by enabling and encouraging it to carry on long trains of thought”—like those similar to the mental actions we use with numerical figures for algebra. From these linguistic “calculations” emerged grammar and syntax—spoken language, the embodiment of thought. To establish this link between thought and speech, Darwin drew on the most up-to-the-minute research on the left hemisphere brain center for language discovered by Paul Broca, in stroke patients, just a few years earlier.11
Despite its elegance and economy (some say sketchiness), Darwin’s explanation did not settle the debate. The Descent was published in 1871. A year later, the London Philological Society joined the Paris group in banning all further papers on language origins. The topic vanished from serious scientific discussion for the next thirty years, until the dawn of the twentieth century, when Edward Sapir, a twenty-three-year-old Columbia University graduate student (who would go on to become one of the founders of modern linguistics), revived it. He did so when, for his master’s thesis, in 1905, he wrote about the Enlightenment scholar Johann Gottfried Herder and his essay on the onomatopoeic origins of words. Sapir marveled at the “epoch-making” brilliance of a thinker who, a century before Darwin, “[did] away with the conception of divine interference” in language and replaced it with “the idea of slow… development from rude beginnings.”12 Sapir called for reviving the moribund study of language origins and wrote: “the path for future work lies in the direction pointed out by evolution.” He recommended “the careful and scientific study of sound-reflexes in higher animals” and an “extended study of all the various existing stocks of languages.”13
In writing those words, Sapir seemed to assume that some languages (specifically, those of indigenous tribes) are more “primitive”—less evolved—than those of “civilized” societies, and might contain clues about how speech emerged from animal vocalizations. But when he began his PhD at Columbia, he took a course taught by Franz Boas, the pioneering anthropologist whose paradigm-shifting study of indigenous peoples showed that all humans are created equal—that is, exist at the same state of evolutionary “advancement,” and that any observable differences between, say, a city-dwelling investment banker and an igloo-dwelling whale hunter are attributable solely to culture, the customs and rituals that emerged to fit the particular environment in which the people live. To find out if the same was true of language, Sapir invented a new specialty: anthropological linguistics, and spent the summers of 1905 and 1906 doing fieldwork among various Native American tribes of Oregon. Through meticulous dissection of their speech—the sounds, vocabularies, and grammars—Sapir confirmed that no language is “simpler,” “more primitive,” or “less evolved” than any other. All partake of the same extraordinary, and mysterious, process of converting abstract thought into elaborately patterned acoustic signals with the voice. As Sapir put it: “the lowliest South African Bushman speaks in the forms of a rich symbolic system that is in essence perfectly comparable to the speech of the cultivated Frenchman.”14
This prompted Sapir to shift focus from language origins to the question of how languages reflect specific cultures. In his classic book Language (1921), he urged others to follow his cultural approach. Generations of linguists duly set off, notebooks in hand, to collect and analyze the multifarious vocal sounds, vocabularies, and grammars produced by the inhabitants of the farthest-flung jungles, savannahs, plains, villages, towns, and cities, with an ear to figuring out why different languages sound as they do.
* * *
Sapir’s cultural approach left untouched a host of questions about language, including how it is generated in the brain, how it is acquired in childhood, and how it got into our skulls in the first place. The first serious attempt to address these enigmas in the twentieth century was by the famous behaviorist B. F. Skinner.
Behaviorism’s central tenet is that the brain, at birth, is a blank slate, and that all behavior is learned. The theory grew out of work by Ivan Pavlov, who taught dogs to salivate at the sound of a ringing bell, a form of associative learning called “classical conditioning.” (Pavlov even taught dogs not to salivate at the sight and smell of food—arguably a more amazing feat.) Later behaviorists developed “operant conditioning,” the shaping of behaviors through punishment and reward—like the lab rat, confined to a box, who learns to push a lever a precise number of times to earn a food pellet. Skinner became the world’s leading behaviorist when he showed how such conditioned behaviors could be linked into long chains to create stunningly complex acts. A New York Times article from 1950 chronicled Skinner’s success in conditioning chickens to play ping-pong and peck out the tune “Take Me Out to the Ball Game” on a keyboard.15
In his 1957 book, Verbal Behavior, Skinner said that speech acquisition in infants is also reducible to chains of operant- and classically conditioned responses inscribed on the blank slate of the brain after a baby is born.16 We now know, from the remarkable research on speech acquisition detailed in this book’s first chapter, that Skinner’s theory was wrong in its details, but not in its overall argume
nt that language can only be acquired through interaction with the environment. Skinner’s description of that interaction was, to be charitable, eccentric: he imagined that the process of “conditioned” learning of language begins with the reward of parents’ joyful reaction to the infant’s first mush-mouthed phonemes (“I think he’s trying to talk—grab the video camera!”), progresses to the parents’ emphatically positive (or negative) reactions to first words (“Yes, Susie, that’s a dog—very good!” or “No, Jake, that’s a cat, not a mat”), and eventually culminates in the teaching of syntax by subtle messages of reward and punishment that reinforce the child’s correct sequencing of words across sentences. Because this conditioning uses the identical punishment-reward process as lever-pushing in rats or tune pecking in chickens, the evolutionary roots of language were, in Skinner’s view, self-evident.17
Skinner’s explanation of language acquisition was characterized by its extreme “nurturist” stance, a view that speech is all learning and nothing about our ability to speak is inborn—a position in direct opposition to the one adopted, right around the same time, by Noam Chomsky. Indeed, it was Chomsky’s demolishing 1959 review, in the leading journal Language, of Skinner’s Verbal Behavior (he called it “empty,” “vacuous,” “false,” “meaningless,” and “just a kind of play-acting at science”)18 that put Chomsky (then an obscure thirty-two-year-old linguist at MIT) on the map, and which brought to the attention of the scientific world at large his view that language is not learned at all, but “grows like any other body organ.” The theory would make Chomsky one of the most famous scientists in the world.
This Is the Voice Page 11