This Is the Voice

Page 5

by John Colapinto

Janov personally oversaw Lennon’s treatment, which lasted five months. Shortly afterward, Lennon released his first post-Beatle LP, John Lennon/Plastic Ono Band, which features songs inspired by Primal Scream, including the harrowing “Mother,” where Lennon repeatedly shrieks “Mommy come hoooooooome,” his voice growing more desperate, more atonal and ragged, with each iteration of “home.” The effect is genuinely spooky: his final long-drawn cry, on the fade out, could be a newborn wailing in vain for its mother. In interviews, Lennon praised Primal Scream (“You’re so astounded by what you find out about yourself”)32 but a few years later he relegated the treatment, and Janov, to the heap of cast-off therapies, religions, drugs, and gurus he had embraced and abandoned since the mid-1960s.

* * *

Although automatic, the newborn’s “biological siren” also carries an echo of the child’s native tongue, a linguistic palimpsest imprinted on the fetal nervous system during the final two months of listening in the womb. When researchers compared the cries of newborns from France and from Germany,33 they discovered that the French two-day olds wailed on a rising pitch contour, mirroring the melodic pattern of spoken French; German newborns cry on a downward arc typical of that language’s prosody. The study’s authors saw this astonishingly early mimicry of the maternal voice as a crucial adaptation to attract the mother’s attention and “foster bonding.”

But forming actual words, in any language, is anatomically impossible for all newborns and remains so for many months. This is owing to the extraordinary fact that we emerge from the womb with a larynx in the same high throat position as that of adult chimpanzees—which is to say that it is not located around the middle of the neck, as in adult humans (the bulge of the male Adam’s apple is actually the pointy cartilage at the front of the larynx, to the inside surface of which one end of the vocal cords attach). Instead, the newborn’s larynx is crowded up into the back of the mouth, close to the opening of the velum. This aids breastfeeding by creating an uninterrupted airway from nose to lungs (so newborns can suck at mom’s breast without having to stop and “come up for air” as they feed—the milk flowing around the sides of the raised larynx and into the stomach).

But this high larynx position also severely restricts the range of vowels that any newborn can utter. And the ability to produce clear, distinctly different vowels, one from another (for instance, ee, ahh, ooo), is crucial for articulate speech; it’s how you make, for instance, the separate words had, heed, head, hide, hid, hood, who’d, Hud from the same set of consonants; or dad, dead, deed, did, Dodd, dud. By altering slightly the curve and position of the tongue in the oral cavity, you change the relative size and shape of the different sections of your vocal tract, which runs from your vocal cords vertically up your throat and, after a ninety-degree bend, into the horizontal section of your mouth. Though they form a continuous tube, the vocal tract’s throat and mouth sections act as two independent resonance chambers—and you boost certain vowel-defining overtones in the vocal spectrum depending on how you shape those resonance chambers with your tongue. For the ee sound, you lift the tongue toward your palate and push it forward to make the mouth resonance chamber small, which boosts the higher-pitched overtones (like a small-bodied violin); but by pushing your whole tongue forward, you simultaneously enlarge the throat resonator, boosting the lower-pitched overtones (like a big-bodied cello). The blended pitches produce the complex sound we hear as ee. The reverse happens with Ahhh, when you drop the back of your tongue, making a big mouth resonator and a small throat resonator. The lips get into the act when you make the ooo and oh vowels, rounding and extending the lips, which lengthens the entire vocal tract and lowers the pitch of all the overtones in the voice spectrum—in exactly the same way that a trombonist makes low notes by pushing out the slide on his instrument, lengthening the resonance tube.

The tiniest changes to the size or shape of the vocal tract’s resonators has a huge effect on the different sounds our brains perceive, which is how English speakers, through subtle adjustments to the tongue and lips, produce the twenty-odd vowels of English, or Swedish speakers make the forty distinct vowels of their language. Even the slight pulling in of the lips against the teeth when we smile shortens the vocal tract enough to raise the signal’s entire overtone spectrum, “brightening” the sound so that we can tell, over the phone, that the person speaking to us is in a good mood (you hear a smile). You also detect a sulky mood in the sound of a pout, which extends the lips, lowering the overtone spectrum. (Which is why photographers, when they want you to assume the expression of a happy person, shout: “Say cheese!” and not “Say choose!”)

* * *

Now, imagine that you didn’t have a throat resonator because (like a baby or a chimp) your larynx is pushed up into the back of your mouth. You’d be limited to the vowels that can be made only with the mouth resonator—a sound linguists call the schwa. Kind of a short e sound, it’s actually the most common speech sound in all languages, as well as being the sound you make when pausing for thought (“uhhh”). Call it the sound of the cerebral cortex in neutral gear. It’s useful in its place (at the end of words like “the”), but not so good if it were the only vowel sound we could make. The sentence “Who hid the head in the hut—Hud?” would come out as “Huh hud thuh huhd uhn thuh huht—Huhd?” If so vocally challenged a species managed to zoom to the top of the food chain, it would emphatically not be because of its ability to articulate clear, unambiguous vowels.

And it’s why a newborn is physically incapable of producing any human language. Only as the baby transitions from liquid to solid food does the larynx descend down the throat, literally inching down the neck, day by day. As it does so, the larynx pulls the root of the tongue down with it (the back of our tongue is attached to the larynx by a system of ligaments). This elongation of the tongue down our throat is crucial to speech, because it is the tongue’s vertical section that we manipulate (pushing it forward and backward) to produce the correct throat overtones for clear, well-articulated vowels.

* * *

As the baby’s larynx descends down the throat in the first months of life, she also gains considerable motor control of her articulators and starts to make an array of speech-like lip-pops for p and b, percussive tongue hits for d and t, fricatives and sibilants (like s and sh, which break the sound wave up into a hissing turbulence by pushing it through a narrow gap between tongue and teeth), as well as nasals, like m and n, by opening the velum and sending the soundwave through the nose. But not until she is six to eight years old will her larynx descend to the point where she can make vowels as finely sculpted as an adult’s,36 but even by her first birthday her larynx will have descended enough so that adults can infer what vowel sounds she’s trying to make—a good thing, since it is at precisely this moment in infancy, at one year old, that she will put the various voice sounds together to utter her first word.

A baby’s ability to translate its mental knowledge of language into spoken utterance is an extraordinary and, until recently, mystifying accomplishment. Babies do glean some important lessons in the complicated gymnastics of speech from peering at a speaker’s mouth (my own home videos of our newborn son show him staring with rapt fixity at the movement of my own lips, as I natter at him from behind the camera, and he can even be seen clumsily trying to mimic the moves). But such “lip reading” can tell a baby only so much. It cannot, for instance, show a baby when and how to snap the vocal cords closed across the windpipe to make the voiced b that is so different from the unvoiced p, or what inner mouth target to hit with the tongue to make a t or a g sound, or exactly how big a gap to leave between the tongue and the back of the teeth to go ssss. And yet, around one year of age, most babies use these stunningly well-calibrated, precise, essentially invisible maneuvers to utter their first word.

They learn to do this the same way they learn everything: play. Tireless, dedicated, focused, trial-and-error play. Specifically, vocal play, which speech expe
rts call “babbling”—an activity that begins, around four months of age, with utterances like ba, ba, ba or da da da or ga ga ga. This repetition of single sounds is called reduplicated babbling and it morphs, around eight months, into variegated babbling, which features the trickier task of mixing up various sounds, both voiced and unvoiced, in word- and sentence-like strings: as in kaga-bodee or paba-tee-no. These vocalizations were long seen as simply a way for the baby to strengthen the articulatory muscles. Today, experts in child development understand babbling to be the single most important stage in speech acquisition. Without it, we would never be able to tune our voice to the sounds of spoken language.

In babbling, babies listen closely to the sounds that emerge from their own mouth, and they compare these to the speech sounds of their native tongue—all that linguistic information they’ve been busily storing up in their auditory cortex since their seventh month in the womb. When, through random, playful movements of lips, tongue, velum, and larynx they get an accidental “hit”—a match between a stored speech sound and the noise they make with their voice—they get excited and repeat the sound (babababa or mamama or dadada), etching the instructions for these gestures into a part of the brain that is responsible for learning, and then coordinating and sequencing, highly complicated bodily movements: an ancient set of structures deep in the brain called the basal ganglia.37 It’s the same part of the brain you use when you learn to ride a bike or throw a ball (or, indeed, learn to walk, at around age one). At first, these actions are ill-coordinated, clumsy, until, through relentless practice, the basal ganglia get the moves sorted out and etched so deeply in your muscle memory that making an unvoiced pa versus a voiced ba, or the l sound in la versus the t in ta, seems unlearned, automatic. Not surprisingly, researchers have been investigating abnormalities in the basal ganglia, and its connections to the brain’s “language centers,” as a primary cause of stuttering.38 But, as with all aspects of voice, “mechanistic” explanations are only part of the story. Psychology, acting in concert with biological causes, also plays a role, as John Updike suggests when exploring the roots of his own stammer in his memoir, Self-Consciousness. He stuttered when he felt himself to be “in a false position,” as when addressing his high school as class president: “I did not, at heart, feel I deserved to be class president… and in protest… my vocal apparatus betrayed me.”39

Birds undergo the same process of vocal learning as us when they acquire their species-specific songs in infancy. Ornithologists call birds’ early subsong “babbling,” and like human speech it involves sentence-like strings of sounds that the baby bird makes in imitation of its adult tutors, grooving into its version of the basal ganglia the proper sequence of tongue and beak movements that turn the phonation formed by their version of the larynx (it’s called the syrinx and is located deep in the bird’s chest) into the stream of twittering birdsong unique to a particular species. Which is why parakeets and other parrots can learn to mimic human speech—and it’s also why, if you move a newborn bird to a community with a subtly different song than that of its birth parents, the displaced bird’s voice will (as Darwin noted) take on the pitches, syllable lengths, and rhythms of its adoptive community, a “provincial dialect”—an accent.

The same of course happens to us. A human baby born into an affluent neighborhood of London will, during the babbling stage, wire in the motor circuitry for making an aw vowel sound in words like “dance” (dawnce), because that’s what it hears from parents and others. But transport that same baby to New York during the babbling stage and it will wire up speech circuits for saying “dance” with an ah vowel sound. Parisian babies use their babbling to build motor pathways for easily articulating a sound like eu, in the word neu, which requires a high-front-tongue position (as in ee), with the lips, not pulled against the teeth as in English, but extended in a pout (as if saying oo). For Americans or Brits, who don’t hear this eu sound as babies, producing a proper French eu feels as awkward as trying to rub your stomach and pat your head at the same time; for French kids, who practice the oral gestures for a year, grooving them into the basal ganglia, it’s second nature.

Consonants are also etched into the basal ganglia during babbling: babies raised in India learn to curl the tongue back to tap the tip against the high part of the palate for t and d to match the sound their parents make, whereas babies in Texas tap the tip against the gum ridge, and French babies mash the body of the tongue against the back of the teeth, like Catherine Deneuve saying tu.

An important speech variable that linguists call “Voice Onset Time” is also learned in the babbling stage: the neural instructions for when, precisely, to turn on the vibration of the vocal cords to transform a p into a b or a t into a d. So sensitive to the timing of voice sounds are our ears that we can actually hear shades of ambiguity within the 65 millisecond VOT difference that distinguishes saying pa from saying ba. Hindi speakers, for instance, when saying pa, start phonating the a about 21 milliseconds sooner than do English speakers, giving a little bit of overlapped voicing to the initial lip pop.40 You can hear that tiny difference. I was recently listening to CNN broadcaster Fareed Zakaria, whose native language is Hindi, talking about Islam and the “path to reform,” except that it sounded like he was ever-so-slightly saying “bath to reform.” He was turning his vocal cords on a shade earlier than a native English speaker. VOT also explains something that has obsessed me since I first saw the movie Wait Until Dark at age ten and noticed that Audrey Hepburn says words like “can’t” and “call” a little bit like she’s saying gan’t and gall. I always assumed that this was some kind of finishing school affectation, since her overall accent is upper-class British. Then I learned that she was born and raised in Brussels. Her original language, Dutch, has a slightly faster voice-onset time for a after hard c than does English. So, when Hepburn says gan’t for can’t you are hearing not an affectation, but an inexpungable aspect of her earliest childhood—which is what you hear when most people speak, unless they’ve taken pains to remove such clues from their voice. That is, by changing their accent.

* * *

Even if some subtle markers like Voice Onset Time persist, the accent you learn in babyhood from your parents can be unlearned. In fact, children who are the offspring of foreign-accented parents don’t even have to undergo formal training to lose their parents’ pronunciation—it happens automatically if they start early enough. And most children do, when they leave the linguistic bath of home and go to school at age five. Henry, the eldest son of my Australian friends, Tony and Leslie, spoke with his parents’ strong thray-anitha-shrimp-ahn-thah-bahbie Aussie twang, until he started attending a New York City elementary school. By Easter, his “Strine” accent was gone and his speech was indistinguishable from that of a native New Yorker. I recall, with some guilt, how my own son came home from first grade at his Manhattan public school in tears because his classmates had mocked him for saying “Sorry” with the low-back “o” vowel he had imprinted from his immigrant Canadian parents. Soon, and with no conscious effort, he rid himself of our mortifying “o,” replacing it with a “proper” American version (which sounds, to my Canadian ears, like “Saaarry”)—just one example of how the primary influence on human behavior shifts from parents to peers when children enter the Darwinian arena of the schoolyard. The voice, like every other behavior, adapts for survival.

But only within a certain time window, or “critical period.”

Critical periods are developmental stages in childhood during which particular skills must be learned, or they will fail to develop altogether and after which they are locked in, more or less permanently. That speech and accent acquisition are subject to critical periods was elegantly demonstrated in studies of very young children who had suffered brain damage in an important language-processing region called Broca’s area—a half-dollar-sized patch of cells on the surface of the left hemisphere, near the temple. Activity in Broca’s area is how we construct our mental senten
ces before we say them: slotting the correct speech sounds into words, like a Scrabble player shifting letter tiles on her rack, and arranging those words into the right order. Broca’s area then passes this information to the part of the brain that initiates the movements of the lungs, larynx, tongue and lips that turns a thought into sound. People with damage to Broca’s area (through stroke or trauma) suffer from a speech disorder, called Broca’s aphasia, where they know exactly what they want to say (their thinking is mostly unimpaired), but struggle to articulate it. Trying to say “Pass the salt, please,” they might laboriously stammer, “Suh-suh-salt… Po-po-pose… Pass… Puh-puh-please”—often mixing up word order and putting incorrect sounds into the words, like “galt” for salt, or “clease” for please.” Psychologist Eric Lenneberg discovered that children between two and ten years old, with injuries to Broca’s area, manifest the aphasia. But unlike adult stroke victims, who rarely regain normal speech, the children soon recovered perfect fluency, because their young brains are so “plastic”—that is, they can rewire themselves, myelinating new circuitry, with ease. (So astonishingly plastic is the newborn brain, babies born entirely without a left hemisphere, and thus without a Broca’s area at all, briskly rewire the right hemisphere for speech and are able to talk normally.)41

‹ Prev Next ›