The human capacity for speech.
* * *
Linnaeus was a few centuries ahead of his time in recognizing that our unique ability for spoken language resides not in any dramatic anatomical difference between us and our nearest animal kin—except for our slightly lower larynx, a unique distinction of our species that Linnaeus never mentioned, no doubt because he considered it unimportant, but whose implications for the origins of language we will look at later. For now, it is enough to point out that our capacity for speech derives, chiefly, from differences in our brain, especially the cortex, the wrinkled outer layer where Broca’s and Wernicke’s areas are found.
However, it is also vitally important to stress that, when we speak, we don’t draw solely on the cortex and the activity in Broca’s and Wernicke’s areas. If we did, we would sound like HAL in 2001, or Mr. Spock in Star Trek: precise, but affectless, flat, weirdly inhuman. (“A. Unicorn. That. Is. Eating. A. Flower.…”) What makes our voice human (what makes us human) is emotion—the feelings and moods that underwrite everything we do and that emerge, vocally, in our prosody. Nor is this music a mere garnish, or add-on; we’ve already seen how the melodic and rhythmic contours of speech orchestrate conversation, define syntactic and grammatical units, and, indeed, inculcate language in newborns. It was from this music, Darwin says, that our capacity for speech evolved in the first place, through a shaping of emotional, expressive vocal sounds. To appreciate how this came about, we need to glance back over our evolutionary history, to track developments that took place in parallel to the bodily changes that occurred as the various speech organs (lungs, larynx, lips, and tongue) were evolving—to events unfolding inside the skull.
THREE EMOTION
Over the course of animal evolution, three more or less distinct layers of the brain emerged,1 layering themselves one on top of the other. We retain all three layers, each of which plays a specific role in how we control (or fail to control) the emotional channel of our voice.
The oldest layer, the brainstem, which first began to develop with the marine ragworm over 600 million years ago and was refined in the earliest fish, is the seat of all our involuntary processes like breathing, blinking, and heartbeat. Reptiles possess a brain that is almost all brainstem and their existence is duly characterized by largely instinctive, reflexive behaviors devoid of anything we would describe as “emotional.” Lizards don’t even form maternal attachments to offspring—they will eat their babies after they emerge from the egg (newly hatched lizards instinctually flee to treetops to avoid being cannibalized). Because the voice, in all species, is first and foremost an instrument of communication, of social interaction, the lack of sociality in reptiles had huge implications for how their voices evolved—or, more to the point, didn’t. Of the six thousand or so lizard species, most don’t vocalize at all, even to repel aggressors or express an urge to mate (instead, they use silent body postures and tail wags). You can get a lizard to use its voice, by accidentally stepping on it, which calls up a pure pain response in its brainstem. The rest is silence.
Our brainstem, which we inherited more or less intact from reptiles, behaves exactly like a lizard’s, vocally speaking. You can test this next time you accidentally hit your thumb with a hammer. You will make a noise that issues directly from your brainstem or, as it is colloquially known, your “lizard brain”—a scream that might sound like Oww! or Augghh! or perhaps AHHHH!—it’s impossible to render in writing for the very reason that such noises are made without access to the vowel-and-consonant-generating activity of Broca’s and Wernicke’s areas up in the newfangled cortex. Speech scientists call these noises “fixed action patterns.” They’re hardwired into the brainstem and activated by an “innate releasing mechanism”—in the hammer-to-thumb example, the pain response. They can be triggered by other kinds of extreme sensory input, like the pleasurable sensations that lead to orgasm (moans or excited cries), or the shout you produce upon entering what you assume to be your empty home only to have the lights flash on and people scream “Happy Birthday!”—a surprise sensory assault that stimulates the visual and auditory inputs to your brainstem with sufficient alarm to provoke a reflexive, fixed-action Urrgh! or Aaaukkkh!
For very young humans, the surprise sensory assault is the sudden bright light and ear-splitting noises of the delivery room—which, as noted earlier, make them reflexively go Waaaah! That this noise is a hardwired, preprogrammed reflex is clear, not only from a newborn’s ability to produce it upon first contact with air, but from a tragic “experiment of nature”—a rare birth disorder in which human babies are born with nothing but a brainstem, their higher brain layers having failed to develop. With so severe a deficit, it might be supposed that these children, who live only a few hours or days after birth, would be vocally silent. Instead, they react to pain with perfectly normal-sounding baby cries.2
* * *
Though involuntary sobs, laughter, and cries of pain or pleasure all express emotions, voice researchers don’t classify them as emotional signals. They call them interjections, and afford them about as much interest as the kick of your foot when the doctor taps your knee with his rubber hammer.
Of far greater fascination to voice science are the more subtle vocal emotions, the prosodic shadings that betray anxiety, hostility, lust, doubt, guilt, love. These sounds are generated in the second evolutionary layer of the brain, which emerged with mammals. Known as the limbic system, this layer grows over top of the brainstem and it generates emotions in us by triggering the release of chemicals called neurotransmitters that flood our body, producing inner states of feeling that aid us in reacting appropriately to the situations we encounter. Thus the sight of a menacing-looking stranger advancing upon you in a dark alley activates a limbic structure called the amygdala, a small kidney-shaped organ (we actually have two of them, perched on either side of the brainstem), which signals to the endocrine glands for the release of adrenaline, which kicks up your heart rate, speeds your breathing, induces sweating, and prepares your muscles to fight or flee—and sends a feedback signal along another neural pathway to the higher brain centers that sparks the conscious, “felt” emotion of terror. By contrast, the sight of your newborn baby smiling up at you acts on a different limbic structure (the nucleus accumbens), an anticipatory-pleasure and reward center, which triggers release of the neurotransmitters dopamine and serotonin, endogenous opiates that your own body’s glands produce to suppress physical pain. These chemicals create subjective “felt” states like contentment, serenity, love. Which is why lizard moms, lacking a limbic system, instead see their babies solely as a way to appease their appetite.
Crucially, the limbic system, as the part of the brain that makes us social animals, also shapes how we signal to others our inner states, as a way to enhance our chances of surviving and of mating. These social signals include facial movements like smiling when we encounter an old friend, or furrowing our brow and setting our jaw when we confront a dangerous enemy, or the myriad shadings of emotional prosody in our voice.
* * *
The first hard evidence of the limbic system’s role in controlling vocal emotion arose from research by the Swiss neuroscientist Walter Rudolf Hess, winner of the Nobel Prize in Physiology or Medicine in 1949. Hess used cats as experimental subjects, surgically implanting hair-thin electrodes into the brain.3 When they awoke from the operation, the cats could move freely around Hess’s lab. By delivering pulses of low-voltage current, he induced “natural” neuron activity in specific areas of the cats’ limbic systems and found that, like a gamer using a joystick, he could elicit a wide repertoire of emotional vocalizations in the animals. Activate the amygdala and they produced a hissing, spitting, and snorting, which accompanied classic feline “rage gestures” (dilated pupils, flattened ears, raised fur, extended claws); stimulate the nucleus accumbens and the cats made a “happy purring,” which was paired with outward signs of calm and contentment.
In the late 1960s, a Ger
man research team at the Max Planck Institute of Psychiatry in Munich adapted Hess’s method to study the voice of a higher mammal species, the squirrel monkey, a small, intelligent, exceptionally social primate from the rain forests of South America, who produce a far wider array of vocal noises than do cats: threat calls, mating cries, friendly signals, parental sounds. The team elicited some fifty clearly differentiated emotional calls and cries by stimulating many areas of the monkeys’ brains, including specific limbic structures that are also found in our brain. “Growling” indicated a “directed aggression”; “cackling” signaled high excitement; “chirping” was a friendly sound promoting “group cohesion”; “trilling” was nonaggressive and focused group attention; “quacking” expressed “irritation and unease”; “shrieking” signaled the “highest degree of excitement.” To no one’s surprise, stimulating the amygdala produced the most aggressive sounds, a loud hissing and growling—sounds identical in acoustic profile to those made by cats in experiments in which their amygdala was stimulated—“a remarkable homology,” the German team noted.4
“Homology” refers to the shared ancestry of anatomical structures. The homology was remarkable because the identical, amygdala-driven threat sound, in cats and monkeys, was strong evidence for the evolutionary roots of particular emotional vocalizations in specific limbic structures, structures conserved through all mammalian species, including our own. Which was big news. Because if primates (monkeys) inherited their emotive calls from lower mammals (cats), it was highly likely that we inherited those same calls from our primate forebears.
Charles Darwin was the first to raise this possibility, a century earlier, in his third and final work on evolution: The Expression of the Emotions in Man and Animals (1872) where he shocked the world by stating that our emotional expressions do not derive from the stirrings of a divinely bestowed soul, but are instead evolved traits that create all the facial expressions, bodily postures, and vocal sounds “which we recognize as expressive.” In this, Darwin said, the voice is “efficient in the highest degree,” acting, in hostile encounters, like a “harsh and powerful” weapon to repel antagonists, and in mating and child-rearing, as a soft, high-pitched, loving signal to attract mates or nurture offspring.5 From these insights, Darwin derived his “principle of antithesis,” in which opposite emotions give rise to diametrically opposed body postures, facial expressions, and vocal noises—all part of a social signaling system for repelling predators or winning mates. Aggressive or angry animals stand tall and stiff, raising their hair to suggest a bigger, more formidable body—visual signals that they match with the voice, through low-pitched, growling sounds that also “suggest” a bigger, more formidable body (a size bluff); submissive or loving animals win romantic partners by loosening their stance, even pushing the body against the ground like an affectionate dog, flattening the hair—and producing the high, or whining, pure vocal notes that suggest a small, submissive body (a reverse size bluff).
Darwin’s theoretical musings were given empirical support in the 1970s by zoologist Eugene Morton, who used a device called the spectrograph to minutely parse the voices of some fifty-five bird and mammal species, showing that all animals vocalize with low-pitched growls for aggression and high-pitched pure notes for friendliness and mating.6 (Our current parakeet, Rudy, does this, growling a low, stuttering warning when I extend my finger for her to hop onto, but unfurling mellifluous pure-toned song-twitter when happily perched on my shoulder.) Morton, moreover, showed that animals produce these vocalizations on a graded spectrum between the two extremes of low-pitched growl and high-pitched pure tone, often mixing elements of the two opposed signals in a single bark, chirp, or bleat when unsure of exactly how they feel about a potential friend or foe. They trade these vocalizations back and forth, testingly (you’ve probably noticed dogs doing this with other dogs, moving from submissive, questioning whines to assertive barks to a warning growl—and if all such diplomatic interchanges fail, outright fighting)—a form of animal “conversation” that, if less exquisitely timed than our own, is nevertheless conducted according to similarly coded variations in pitch, volume, and rhythm; by prosodic cues. Indeed, this basic signaling pattern, essential to the main engines of natural selection (survival and reproduction) was, scientists now believe, hardwired into the animal genome over millions of years, and it sets the basic parameters of emotional prosody. According to linguist John Ohala, this pattern is heard even in the rise in pitch we use at the end of a question, a prosodic feature found in virtually all languages. Ohala says a speaker raises his pitch to signal that he is deferring to his interlocutor, assuming a posture of submission to their greater authority. When speakers answer a question, they do so in a downward pitch trajectory, affirming, vocally, who’s top dog.7
A critical point in Morton’s study of animal voices was that the number and variety of emotional vocalizations, along the spectrum between the endpoints of growl and whine, increases with a species’ “intricacy of social behavior.” Those animals who live in larger cooperative groups, with more complex social interactions, produce and process signals with more points—more sounds—along the emotional spectrum. Morton cited the German study of the highly social squirrel monkey and its over fifty finely differentiated calls and cries.
We are a species of ape who displays the most varied vocal emotional signaling of any animal, by far. Between the end points of low-pitched, growling anger, and high-pitched, singsong notes of friendly greeting, we produce a near-endless array of vocal inflections and intonations: subtle manipulations of pitch, rhythm, timbre, and volume that can express basic happiness, sadness, rage, or fear, but also such (presumably) uniquely human states as pride, wistfulness, nostalgia; or indeed, nostalgia-tinged-with-aggression, or Schadenfreude-with-a-slight-admixture-of-guilt. This subtlety reflects not only the complexity of the social interactions we have to negotiate, but also the labyrinthine variety of our inner states of emotion and consciousness, which arise not only in response to events in the present (as is the case with nonhuman animals whose vocalizations are tied to in-the-now reactions to threats, predators, mates, food gathering, and territorial defense), but our memories of past acts and encounters, as well as our fears and hopes for the future, and any number of other conscious (and subconscious) thoughts that, as far as science has been able to divine, is a result of our hugely expanded cortex—the third layer of the brain that grows atop the brainstem and limbic system.
* * *
When it comes to cortex, size matters. More cortex means more computational power, greater intelligence, and stronger powers of reasoning. Our cortex, which went through a truly explosive period of growth after we parted evolutionary company with chimpanzees some seven million years ago, is by far the biggest in all of nature, three times that of the next largest (the chimp’s).
Scientists have offered many explanations for why our cortex expanded so much and so fast. Eugene Morton was the first to propose that emotional vocal signaling was one of them, since perceiving and processing fine differences in vocal pitch and timbre—and assigning to them nuanced gradations of aggression or submission—are acts that require higher-order processing than the brainstem or limbic system can offer. This seems as good a theory as any, and it helps explain why the highly social and vocal squirrel monkey has a cortex three times that of a cat’s and why we have a cortex that is, in relation to body size, the biggest of any animal by a long shot.
One way that our massive cortex affects our emotional vocalizations is by editing, or censoring, them—modulating the spontaneous noises that might otherwise burst from us. So, in a hostile encounter with a boss, teacher, parent, spouse, child, or the guy who cuts in front of us in line, we might experience a flare of activity in our amygdala, which ordinarily would trigger a loud, angry growl or hostile snarl. But to blare forth with such a noise would usually come at too high a social cost, so we control the vocal noises we make, keeping the overt hostility out of our tone, and preservi
ng relations with our family, friends, and the line cutter.
Such top-down control of our emotional behavior is an actual physical process in the brain, as first demonstrated in the 1990s by Antonio Damasio, a neuroscientist at the University of Southern California. He used fMRI brain imaging to show that, when emotionally aroused, we send nerve signals along the axons that extend from our cortex down into the limbic structures beneath to subdue limbic activity, dampening or modulating unfiltered emotional responses.8 The notion that higher cognitive functions (thought, reason, and will) rule over our “animal passions” has informed Western thought for millennia, from Shakespeare’s plays to Freud’s psychoanalysis (where the civilizing “superego” strives to control the unbridled, animal “id”).9 Damasio showed that this model of human consciousness is not a metaphor: it is a fact—and one reflected in the prosodic contour and color of every syllable we speak.
That our cortex and limbic system operate somewhat independently of each other in our emotional voicing is clear from disorders like Tourette’s syndrome in which some sufferers make uncontrolled outbursts that sound like barks and whinnies, neighs and roars. Brain scans reveal overactive amygdalae—and a strangely quieted cortex.10 That some Tourette’s sufferers involuntarily shout swear words—“Shit!” “Fuck!”—would seem to make little sense because language is not computed in the limbic system. But brain scans also show that such “prohibited” words originate, not in the language-generating cortex, but in the limbic emotion centers. “Bad words” have, in effect, been banished from Wernicke’s mental dictionary through early social conditioning (“Now, Susie, that’s not a nice thing to say!”), and they take up residence in a lower part of the brain, on the margin where the cortex meets the limbic structures.11 This is why, when you hit your thumb with a hammer, you sometimes let fly with a “FUCK!!” or “SHIT!” You are accessing these socially prohibited words from the same part of the emotional brain from which people with Tourette’s syndrome pluck them.
This Is the Voice Page 8