Book Read Free

This Is the Voice

Page 3

by John Colapinto


  The atavistic echo of nonhuman animal sounds (for marking territory and showing tribal kinship) are audible, as well, in the way we shape vowels and consonants—our regional accents (aboot which, more later). The dramatic timbral and pitch differences between men’s and women’s voices, along with certain changes in texture that occur in moments of sexual arousal, are evolved traits central to the continued existence of our species (through the erotic voice signals we send and receive and which spur our urge to mate). As we will see, how we pick our political leaders also depends more on primordial echoes of the beasts and predators from whom we evolved than might be immediately obvious, especially in times of political instability and division when those parts of our brain that respond to the voice’s emotional channel are especially tuned to tones of fear and hate, anger and violence. In short, our collective fate as Homo sapiens (shaped, to a large degree, by the voices of our political leaders) relies far more on purely nonlanguage elements of speech than we might imagine or wish.

  * * *

  Likewise, our individual fates. Our career and romantic prospects, social status and reproductive success depend to an amazing degree on how we sound. This is a question not only of our vocal timbre, which is partly passed down by our parents (in the size, density, and viscosity of our vocal cords and the internal geometry of the resonance chambers of our neck and head), or our accent, but also our volume, pace, and vocal attack: elements of our speech that betray dispositions toward extroversion or introversion, confidence or shyness, aggression or passivity—aspects of temperament that are, science tells us, partly innate, but also a result of how we respond to life’s challenges, in the innumerable environmental influences that mold personality and character and, consequently, our voice.

  In listeners’ ears, our voice is us, as instantly “identifying” as our face. Indeed, researchers in 2018 discovered that voices are processed in a part of the auditory cortex cabled directly to the brain region that recognizes facial features. Together, these linked brain areas make up a person-differentiating system highly valuable for ascertaining, in an instant, who we know and who’s a stranger.8 The voice recognition region can hold hundreds if not thousands of voices in long-term memory, which is why you can tell, within a syllable (“Hi…”), that it is your sister on the phone and not a telemarketer, and that Rich Little is attempting to “do” Bill Clinton and not Ronald Reagan (both of whose voices you can conjure in your auditory cortex as readily as you can call up their faces in your mind’s eye).9 That we do, sometimes, mistake family members for one another over the phone shows that not only are immutable anatomical attributes of voice (vocal cords and resonance chambers) as heritable as the facial features that make parent and child (or siblings) resemble each other, but that families often share a style of speaking, in terms of prosody, pace, and pronunciation. But the voice of every person is (like face or fingerprints) sufficiently unique, in its tiniest details, that such misidentifications are usually caught within seconds.

  Indeed, it is a philosophical irony of cosmic proportions that the only voice on earth that we do not know is our own. This is because it reaches us, not solely through the air, but in vibrations that pass through the hard and soft tissues of our head and neck, and which create, in our auditory cortex, a sound completely different to what everyone else hears when we talk. The stark difference is clear the first time we listen to a recording of own voice. (“Is that really what I sound like? Turn it off!”) The distaste with which so many of us greet the sound of our actual voice is not purely a matter of acoustics, I suspect. A recording disembodies the voice, holds it at a distance from us, so that we can hear with pitiless objectivity all aspects of how we speak, including the unconscious ways we manipulate prosody, pace, and pronunciation to create the voice we wish we had. When I mentioned this to a friend, he grimaced at the memory of hearing his recorded voice for the first time. “God!” he cried. “The insincerity!” He was reacting to the mismatch between who he knows himself (privately and inwardly) to be, and the person that he seeks to project into the world.

  All of us do this, quite unconsciously, and until we hear ourselves on tape we remain mercifully deaf to how we perform this ideal self, in a bid to “put ourselves across,” to make an impression. The enterprise of being human is to carve out a congenial place to occupy in the world, an achievement that we know, intuitively, to depend to a frightening extent on how our voices sound in the ears of others. This book isn’t one of those instructional manuals that promises to give you a more assertive or sexy or persuasive voice—to aid you in the Darwinian struggle to advance in your job and land the partner of your dreams. But I hope that over the course of its eight chapters and coda it will have solved certain mysteries of the voice sufficiently to give you better insight than the Fix Your Voice Fix Your Life! come-ons that promise, through a few “easy breathing exercises,” to transform you overnight! Something as negligible as a minuscule bump on one vocal cord changed who I am by altering my voice. It works in both directions. To alter your voice in ways that conform better to the person you feel yourself to be, or that you wish you were, means changing, fundamentally, who you are. It can be done, but not overnight.

  * * *

  In terms of structure, this book is a little like the vocal signal itself: it begins by examining how the voice is manifest in a single individual (a newborn baby), and then radiates outward like a sound wave, in ever-expanding concentric circles, from investigating that initial assertion of raw need (feed me!) to examine how we mold that cry into speech and then how we engage other voices, in the form of back-and-forth conversation between two people. The scope then widens, again, to look at how the voice works in the surrounding society: the voice as badge of tribal membership, status symbol, class marker, and racial identity, all factors that help to place us in the social matrix, and define us in terms of who we woo and win as romantic partner (straight, gay, lesbian, or trans—each of which has its own particular vocal signal). The outermost circles address the religious voice of exhortation and worship, and the public voice of mass broadcast (radio, television, movies) and, ultimately, the voice of political leadership, the single voices that steer our collective future. The voice of power does not always show our species at its best, but the singing voice invariably does, in my opinion, so I will spend a chapter exploring and celebrating that miracle. The book ends with a view of the aging voice and the wisdom that, if we are lucky, the old voice denotes.

  Along the way, I’ll touch upon the evolutionary pressures that created our uniquely human voice, its subtle emotional prosody and its game-changing specialization for language. Here, I depart from the prevailing view that our ability to shape our vocal signals into meaningful utterance results purely from changes to our brain that appeared some fifty- to sixty thousand years ago, spurring a massive spike in intelligence called the Great Leap Forward—a surge in cognitive power that supposedly caused language to somehow blip to life in our heads. Instead, this book emphasizes the role that the voice itself played in creating language in our species. That story stretches back much further than sixty thousand years—back, indeed, to when the first vertebrates emerged from sea onto land—and a story bolstered by recent research into the genetic mutations behind the blindingly fast and precise tongue and lip movements that enable speech.

  We master this trick of high-speed articulation as infants, a feat of coordination between brain and body so amazing that some scientists have insisted that language must be innate. Newborns do indeed arrive in the world with a staggering amount of linguistic knowledge already present in the brain. But that knowledge derives, not from the fact that words, grammar, and syntax are necessarily preinstalled in us like the operating system on our computer, but because (recent science shows) our surprisingly long and intensive regime of voice-based training for language—the lessons we absorb through listening extremely closely to parents and caregivers—begins even before we leave the womb.

  ONE BABY T
ALK

  The first experiments in fetal hearing were conducted in the early 1920s. German researchers placed a hand against a pregnant woman’s belly and blasted a car horn close by. The fetus’s startle movements established that, by around twenty-eight weeks’ gestation, the fetus can detect sounds.1 Since then, new technologies, including small waterproof microphones implanted in the womb, have dramatically increased our knowledge of the rich auditory environment2 where the fetus receives its first lessons in how the human voice transmits language, feelings, mood, and personality.

  The mother’s voice is especially critical to this learning—a voice heard not only through airborne sound waves that penetrate the womb, but through bone conduction along her skeleton, so that her voice is felt as vibrations against the body. As the fetus’s primary sensory input, the mother’s voice makes a strong and indelible “first impression.” Monitors that measure fetal heart rate show that, by the third trimester, the fetus not only distinguishes its mother’s voice from all other sounds, but is emotionally affected by it: her rousing tones kick up the fetal pulse; her soothing tones slow it.3 Some researchers have proposed that the mother’s voice thus attunes the developing nervous system in ways that predispose a person, in later life, toward anxiety or anger, calm or contentment.4 Such prenatal “psychological” conditioning is unproven, but it is probably not a bad idea for expectant mothers to be conscious, in the final two months of pregnancy, that someone is eavesdropping on everything they say, and that what the listener hears might have lasting impact. The novelist Ian McEwan used this conceit in his 2016 novel, Nutshell, which retells Shakespeare’s Hamlet from the point of view of a thirty-eight-week-old narrator-fetus who overhears a plot (though “pillow talk of deadly intent”) between his adulterous mother and uncle.

  As carefully researched as that novel is regarding the surprisingly acute audio-perceptual abilities of late-stage fetuses, McEwan takes considerable poetic license. For even if a fetus could understand language, the ability to hear speech in the womb is sharply limited. The uterine wall muffles voices, even the mother’s, into an indistinct rumble that permits only the rises and falls of emotional prosody to penetrate—in the same way that you can tell through the wall you share with your neighbor that the people talking on the other side are happy, sad, or angry, but you can’t hear what they’re actually saying. Nevertheless, after two months of intense focus on the mother’s vocal signal in the womb, a newborn emerges into the world clearly recognizing the mother’s voice and showing a marked preference for it.5 We know this thanks to an ingenious experiment invented in the early 1970s for exploring the newborn mind. Investigators placed a pressure-sensitive switch inside a feeding nipple hooked to a tape recorder. When the baby sucked, prerecorded sounds were broadcast from a speaker. Sounds that interested the infant prompted harder and longer sucking to keep the sound going and to raise its volume. Psychologist Anthony DeCasper used the device to show that three-day-olds will work harder, through sucking, to hear a recording of their own mother’s voice over that of any other female.6 The father’s voice sparked no special interest in the newborn7—which, on acoustical grounds, isn’t surprising. The male’s lower pitch penetrates the uterine wall less effectively and his voice is also not borne along the maternal skeleton. Newborns thus lack the two months of enwombed exposure to dad’s speech that creates such a special familiarity with, and “umbilical” connection to, mom’s voice.

  * * *

  The sucking test has revealed another intriguing facet of the newborn’s intense focus on adult voices. In 1971, Brown University psychologist Peter Eimas (who invented the test) showed that we are born with the ability to hear the tiny acoustic differences between highly similar speech sounds, like the p and b at the beginning of the words “pass” and “bass.” Both are made by an identical lip pop gesture. They sound different only because, with b, we make the lip pop while vibrating our vocal cords—an amazingly well-coordinated act of split-second synchronization between lips and larynx that results in a “voiced” consonant. With the p, we pop the lips while holding the vocal cords in the open position, making it “unvoiced.” We can do this with every consonant: t, voiced, becomes d; k becomes hard g; f becomes v; ch becomes j. Babies, Eimas showed, hear these distinctions at birth, sucking hard with excitement and interest when a speech sound with which they’ve become bored (ga ga ga) switches to a fascinating new one (ka ka ka).8 Prior to Eimas’s pioneering studies, it was believed that newborns only gradually learn these subtle phonemic differences.

  The significance of this for the larger question of how we learn to talk emerged when Eimas tested if infants could discriminate between speech sounds from languages they had never heard—in the womb or anywhere else. For English babies this included Kikuya (an African language), Chinese, Japanese, French, and Spanish, all of which feature minuscule differences in shared speech sounds, according to the precise position of the tongue or lips, or the pitch of the voice. The experiments revealed that newborns can do something that adults cannot: detect the most subtle differences in sounds. Newborns, in short, emerge from the womb ready and willing to hear, and thus learn, any language—all seven thousand of them. This stands to reason, because a baby doesn’t know if it is going to be born into a small French town, a hamlet in Sweden, a tribe in the Amazon, or New York City, and must be ready for any eventuality.9 For this reason, neuroscientist Patricia Kuhl, a leading infant language researcher, calls babies “linguistic global citizens”10 at birth.

  But after a few months, babies lose the ability to hear speech sounds not relevant to their native tongue—which has huge implications for how infants sound when they start speaking. Japanese people provide a good example: when speaking English, adults routinely swap out the r and l sounds, saying “rake” for “lake,” and vice versa. They do this because they cannot hear the difference between English r and l. But Japanese newborns can, as Eimas’s sucking test shows. Change the ra sound to la, and Japanese babies register the difference with fanatic sucking. But around seven months of age, they start having trouble telling the difference. At ten months old, they don’t react at all when ra changes to la. They can’t tell the difference anymore. English babies actually get better at it.

  The reason is exposure and reinforcement. The ten-month-old English baby has spent almost a year hearing the English-speaking adults around her say words that are distinguished by clearly different r and l sounds. Not the Japanese baby, who spent the ten months after birth hearing a Japanese r that sounds almost identical to our English l, the tongue lightly pushing against the gum ridge behind the upper front teeth. Because there is no clear acoustic difference between the Japanese r and the English l, Japanese babies stop hearing a difference. They don’t need to, because their language doesn’t depend on it.

  All of which is to say that the developing brain works on a “use it or lose it” basis. Circuitry not activated by environmental stimuli (mom’s and dad’s voices) is pruned away. The opposite happens for brain circuits that are repeatedly stimulated by the human voice. They grow stronger, more efficient. This is the result of an actual physical process: the stimulated circuits grow a layer of fatty cells, called myelin, along their axons, the spidery branches that extend from the cell’s nucleus to communicate with other cells. Like the insulation on a copper wire, this myelin sheath speeds the electrical impulses that flash along the nerve branches that connect the neurons which represent specific speech sounds. Neuroscientists have a saying: “Neurons that fire together, wire together”—which is why the English babies in Eimas’s experiments got better at hearing the difference between ra and la: the neuron assemblies for those sounds fired a whole lot and wired themselves together. Not so for Japanese babies.

  In short, the voices we hear around us in infancy physically sculpt our brain, pruning away unneeded circuits, strengthening the necessary ones, specializing the brain for perceiving (and eventually producing) the specific sounds of our native tongue.

&
nbsp; * * *

  Some infants fail to “wire in” the circuits necessary for discriminating highly similar sounds. Take the syllables ba, da, and ga, which are distinguished by where, in the mouth, the initial sound is produced (b with a pop of the lips; d with a tongue tap at the gum ridge; g with the back of the tongue hitting the soft palate, also called the velum). These articulatory targets determine how the resulting burst of noise transitions into the orderly, musical overtones of the a-vowel that follows: a sweep of rapidly changing frequencies, over tens of milliseconds, that the normal baby brain, with repetition, wires in through myelinating the correct nerve pathways.

  But some 20 percent or so of babies, for unknown reasons, fail to develop the circuits for detecting those fast frequency sweeps. Sometimes a child registers ba, sometimes ga or da. Parents are unaware of the problem because kids compensate by using contextual clues. They know that mom is saying “bat” and not “pat” because she’s holding a bat in her hand. They know dad is talking about a “car” because he’s pointing at one. The problem surfaces only when the child starts school and tries to learn to read. That is, translate written letter-symbols into the speech sounds they represent. He can’t do it, because his brain hasn’t wired-in the sounds clearly. He might read the word “dad” as “bad,” or “gab,” or “dab.” These children are diagnosed with dyslexia, a reading disorder long believed to be a vision problem (it was once called “word blindness”). Thanks to pioneering research in the early 1990s by neuroscientist Paula Tallal at Rutgers University, dyslexia is now understood to be a problem of hearing, of processing human voice sounds.11 Tallal has been helping to devise software that slows the frequency sweeps in those consonant-vowel transitions so that very young children can train their auditory circuits to detect the different speech sounds, and thus wire them in through myelination of the nerve pathways. All to improve their reading.

 

‹ Prev