by David Byrne
The linguist Noam Chomsky proposed that language itself might be an evolutionary spandrel—that the ability to form sentences might not have evolved directly but might be the byproduct of some other, more pragmatic evolutionary development. In this view, many of the arts got a free ride along with the development of other, more prosaic qualities and cognitive abilities.
Dale Purves, a professor at Duke University, studied this question with his colleagues David Schwartz and Catherine Howe, and they think they might have some answers. First they describe the lay of the land: pretty much every culture uses notes selected from among the twelve that we typically use. From one A to another A an octave above it, there are usually twelve notes. This is not a scale, but there are twelve available notes, which on a piano would be all the black and white keys in one octave. (Scales are generally a smaller number of notes chosen from within those twelve.) There are billions of possible ways to divide the increments from A to A—yet twelve gives us a good start.
Traditional Chinese music and American folk music usually employs five notes selected from among those twelve to create their scales. Arabic music works within these parameters too. Western classical music uses seven of the twelve available notes (the eighth note of the Western scale is the octave). In 1921, the composer Arnold Schoenberg proposed a system that would “democratize” musical composition. In this twelve-tone music, no note is considered to be more important than any other. That does indeed seem like a fair and democratic approach, yet people often call music using that system dissonant, difficult, and abrasive. Dissonant sounds can be moving—either used for creepy effect or employed to evoke cosmic or dark forces as in the works of Messiaen (his Quartet for the End of Time) or Ligeti (his composition Atmospheres is used in the trippy stargate sequence of the movie 2001). But generally these twelve-tone acts of musical liberation were not all that popular, and neither was free jazz, the improvisational equivalent pioneered by Ornette Coleman and John Coltrane in his later years. This “liberation” became, for many composers, a dogma—just a new, fancier kind of prison.
Very few cultures use all twelve available notes. Most adhere to the usual harmonies and scales, but there are some notable exceptions. Javanese gamelan music, produced mainly by orchestras consisting of groups of gonglike instruments, often have scales of five notes, but the five notes are more or less evenly spread between the octave notes. The intervals between the notes are different than a five-note Chinese or folk-music scale. It is surmised that one reason for this is that gongs produce odd, unharmonic resonances and overtones, and to make those aspects of the notes sound pleasant when played together, the Javanese adjusted their scales to account for the unpleasantly interacting harmonics.
Harmonics are the incidental notes that most instruments produce above and below the principal (or “fundamental”) note being played. These “ghost” notes are quieter than the main tone, and their number and variety are what gives each instrument its characteristic sound. The harmonics of a clarinet (whose vibrations result from a reed and a column of air) are different from those of a violin (whose vibrations result from a vibrating string). Hermann von Helmholtz, the eighteenth-century German physicist, proposed that it is qualities inherent in these harmonics and overtones that lead us to line up notes along common intervals in our scales. He noticed that when notes aren’t “in tune,” you can hear beating, pulsing, or roughness if they are played at the same time. You can hear this beating if you play the same note on more than one instrument, and if they are ever so slightly different, if they aren’t exactly the same note, you will hear a throbbing or beating that varies in speed depending on how similar they are. An instrument that is out of tune produces beating tones when the octaves and harmonics don’t line up. Helmholtz maintained that we find this beating, which is a physical phenomenon and not just an aesthetic one, disturbing. The natural harmonics of primary notes create their own sets of beats, and only by placing and choosing notes from the intervals that occur among the usual and familiar scales can we resolve and lessen this ugly effect. Like the ancients, he was claiming that we have an inherent attraction to mathematical proportions.
When a scale is made up of fifths and fourths that resonate perfectly and mathematically (this is referred to as “just intonation”), all is well unless you want to change key, to modulate. If, for example, the key (or new scale) you want to move to in your tune begins with the fourth harmony note of your original key—a typical choice for a contemporary pop tune—you will find that the notes on the new key don’t quite line up in a pleasant-sounding way anymore—not if you are using this heavenly and mathematical intonation. Some will sound fine, but others will sound markedly sour.
Andreas Werckmeister proposed a workaround for this problem in the mid-1600s. Church organs can’t be retuned, so they presented a real difficulty when it came to playing in different keys. He suggested tempering, or slightly adjusting the fifths, and thus all the other notes in a scale, so that one could shift to other keys and it wouldn’t sound bad. It was a compromise— the perfect mathematical harmonies based on physical vibrations were now being abandoned ever so slightly so that another kind of math, the math of counterpoint and the excitement of jumping around from key to key, could be given precedence. Werckmeister, like Johannes Kepler, Barbaro, and others at the time believed in the idea of divine harmonic proportion described in Kepler’s Harmonia Mundi, even while—or so it seems to me—he was in some ways abandoning, or adjusting, God’s work.
Bach was a follower of Werckmeister’s innovations and used them to great effect, modulating all over the keyboard in many keys. His music is a veritable tech demo of what this new tuning system could do. We’ve gotten used to this tempered tuning despite its cosmic imperfections. When we hear music that is played in just intonation today, it sounds out of tune to us, though that could be because the players might insist on changing keys.
Purves’s group at Duke discovered that the sonic range that matters and interests us the most is identical to the range of sounds we ourselves produce. Our ears and our brains have evolved to catch subtle nuances mainly within that range, and we hear less, or often nothing at all, outside of it. We can’t hear what bats hear, or the sub-harmonic sound that whales use. For the most part, music also falls into the range of what we can hear. Though some of the harmonics that give voices and instruments their characteristic sounds are beyond our hearing range, the effects they produce are not. The part of our brain that analyzes sounds in those musical frequencies that overlap with the sounds we ourselves make is larger and more developed— just as the visual analysis of faces is a specialty of another highly developed part of the brain.
The Purves group also added to this the assumption that periodic sounds—sounds that repeat regularly—are generally indicative of living things, and are therefore more interesting to us. A sound that occurs over and over could be something to be wary of, or it could lead to a friend, or a source of food or water. We can see how these parameters and regions of interest narrow down toward an area of sounds similar to what we call music. Purves surmised that it would seem natural that human speech therefore influenced the evolution of the human auditory system as well as the part of the brain that processes those audio signals. Our vocalizations, and our ability to perceive their nuances and subtlety, co-evolved. It was further assumed that our musical preferences evolved along the way as well. Having thus stated what might seem obvious, the group began their examination to determine if there was indeed any biological rationale for musical scales.
The group recorded ten-to twenty-second sentences by six hundred speakers of English and other languages (Mandarin, notably) and broke those into 100,000-sound segments. Then they digitally eliminated from those recordings all the elements of speech that are unique to various cultures. They performed a kind of language and culture extraction—they sucked all of it right out, leaving only the sounds that are common to us all. It turns out that, sonically, much of the materi
al that was irrelevant to their study were the consonants we use as part of our languages—the sounds we make with our lips, tongues, and teeth. This left only the vowel sounds, which are made with our vocal cords, as the pitched vocal sounds that are common among humanity. (No consonants are made using the vocal cords.)
They eliminated all the S sounds, the percussive sounds from the P’s, and the clicks from the K’s. They proposed that they would be left with universal tones and notes common, having stripped away enough extraneous information so that everyone’s utterances would now be some kind of proto-singing—the vocal melodies that are imbedded in talking. These notes, the ones we sing when we talk, were then plotted on a graph representing how often each note occurred, and sure enough, the peaks—the loudest and most prominent notes—pretty much all fell along the twelve notes of the chromatic scale.
In speech (and normal singing) these notes or tones are further modified by our tongues and palates to produce a variety of particular harmonics and overtones. A pinched sound, an open sound. The folds in the vocal cords produce characteristic overtones, too; these and the others are what help identify the sounds we make as recognizably human, as well as contributing to how each individual’s voice sounds. When the Duke group investigated what these overtones and harmonics were, they found that these additional pitches fell in line with what we think of as pleasing “musical” harmonies. “Seventy percent… were bang on musical intervals,” he continued.7 All the major harmonic intervals were represented: octaves, fifths, fourths, major thirds, and a major sixth. “There’s a biological basis for music, and that biological basis is the similarity between music and speech,” said Purves. “That’s the reason we like music. Music is far more complex than [the ratios of] Pythagoras. The reason doesn’t have to do with mathematics, it has to do with biology.”
I might temper this a little bit by saying that the harmonics our palettes and vocal cords create might come into prominence because, like Archimedes’s vibrating string, any sound-producing object tends to privilege that hierarchy of pitches. That math applies to our bodies and vocal cords as well as strings, though Purves would seem to have a point when he says we have tuned our mental radios to the pitches and overtones that we produce in both speech and music.
MUSIC AND EMOTION
Purves took his interpretation of the data his team gathered one step further. In a 2009 study, they attempted to see if happy (excited, as they call it) speech results in vowels whose pitches tend to fall along major scales, while sad (subdued) speech produces notes that tend to fall along minor scales. Bold statement! I would have thought that such major/minor emotional connotations must be culturally determined, given the variety of music around the world. I remember during one tour, when I was playing music that incorporated a lot of Latin rhythms, some (mainly Anglo-Saxon) audiences and critics thought it was all happy music because of the lively rhythms. (There may also have been an insinuation that the music was therefore more lightweight, but we’ll leave that bias aside.) Many of the songs I was singing were in minor keys, and to me they had a slightly melancholy vibe—albeit offset by those lively syncopated rhythms. Did the “happiness” of the rhythms override the melancholy melodies for those particular listeners? Apparently so, as many of the lyrics of salsa and flamenco songs, for example, are tragic.
This wasn’t the first time this major/happy minor/sad correspondence had been proposed. According to the science writer Philip Ball, when it was pointed out to musicologist Deryck Cooke that Slavic and much Spanish music use minor keys for happy music, he claimed that their lives were so hard that they didn’t really know what happiness was anyway.
In 1999, musical psychologists Balkwill and Thompson conducted an experiment at York University that attempted to test how culturally specific these emotional cues might be. They asked Western listeners to evaluate Navajo and Hindustani music and say whether it was happy or sad—and the results were pretty accurate. However, as Ball points out, there were other clues, like tempo and timbre, that could have been giveaways. He also says that prior to the Renaissance in Europe there was no connection between sadness and minor keys—implying that cultural factors can override what might be somewhat weak, though real, biological correlations.
It does seem likely that we would have evolved to be able to encode emotional information into our speech in non-verbal ways. We can instantly tell from the tone of someone’s voice whether he or she is angry, happy, sad, or putting up a front. A lot of the information we get comes from emphasized pitches (which might imply minor or major scales), spoken “melodies,” and the harmonics and timbre of the voice. We get emotional clues from these qualities just as much as from the words spoken. That those vocal sounds might correspond to musical scales and intervals, and that we might have developed melodies that have roots in those speaking variations, doesn’t seem much of a leap.
YOU FEEL ME?
In a UCLA study, neurologists Istvan Molnar-Szakacs and Katie Overy watched brain scans to see which neurons fired while people and monkeys observed other people and monkeys perform specific actions or experience specific emotions. They determined that a set of neurons in the observer “mirrors” what they saw happening in the observed. If you are watching an athlete, for example, the neurons that are associated with the same muscles the athlete is using will fire. Our muscles don’t move, and sadly there’s no virtual workout or health benefit from watching other people exert themselves, but the neurons do act as if we are mimicking the observed. This mirror effect goes for emotional signals as well. When we see someone frown or smile, the neurons associated with those facial muscles will fire. But—and here’s the significant part—the emotional neurons associated with those feelings fire as well. Visual and auditory clues trigger empathetic neurons. Corny but true: if you smile you will make other people happy. We feel what the other is feeling—maybe not as strongly, or as profoundly—but empathy seems to be built into our neurology. It has been proposed that this shared representation (as neuroscientists call it) is essential for any type of communication. The ability to experience a shared representation is how we know what the other person is getting at, what they’re talking about. If we didn’t have this means of sharing common references, we wouldn’t be able to communicate.
It’s sort of stupidly obvious—of course we feel what others are feeling, at least to some extent. If we didn’t, then why would we ever cry at the movies or smile when we heard a love song? The border between what you feel and what I feel is porous. That we are social animals is deeply ingrained and makes us what we are. We think of ourselves as individuals, but to some extent we are not; our very cells are joined to the group by these evolved empathic reactions to others. This mirroring isn’t just emotional, it’s social and physical too. When someone gets hurt we “feel” their pain, though we don’t collapse in agony. And when a singer throws back his head and lets loose, we understand that as well. We have an interior image of what he is going through when his body assumes that shape.
We anthropomorphize abstract sounds, too. We can read emotions when we hear someone’s footsteps. Simple feelings—sadness, happiness and anger—are pretty easily detected. Footsteps might seem an obvious example, but it shows that we connect all sorts of sounds to our assumptions about what emotion, feeling, or sensation generated that sound.
The UCLA study proposed that our appreciation and feeling for music is deeply dependant on mirror neurons. When you watch, or even just hear, someone play an instrument, the neurons associated with the muscles required to play that instrument fire. Listening to a piano, we “feel” those hand and arm movements, and as any air guitarist will tell you, when you hear or see a scorching solo, you are “playing” it too. Do you have to know how to play the piano to be able to mirror a piano player? Dr. Edward W. Large at Florida Atlantic University scanned the brains of people with and without music experience as they listened to Chopin. As you might guess, the mirror neuron system lit up in the musicians who wer
e tested, but somewhat surprisingly, they flashed in non-musicians as well. So, playing air guitar isn’t as weird as it sometimes seems. The UCLA group contends that all of our means of communication—auditory, musical, linguistic, visual—have motor and muscular activities at their root. By reading and intuiting the intentions behind those motor activities, we connect with the underlying emotions. Our physical state and our emotional state are inseparable—by perceiving one, an observer can deduce the other.
People dance to music as well, and neurological mirroring might explain why hearing rhythmic music inspires us to move, and to move in very specific ways. Music, more than many of the arts, triggers a whole host of neurons. Multiple regions of the brain fire upon hearing music: muscular, auditory, visual, linguistic. That’s why some folks who have completely lost their language abilities can still articulate a text when it is sung. Oliver Sachs wrote about a brain-damaged man who discovered that he could sing his way through his mundane daily routines, and only by doing so could he remember how to complete simple tasks like getting dressed. Melodic Intonation Therapy is the name for a group of therapeutic techniques that were based on this discovery.
Mirror neurons are also predictive. When we observe an action, posture, gesture or a facial expression, we have a good idea, based on our past experience, what is coming next. Some on the Asperger spectrum might not intuit all those meanings as easily as others, and I’m sure I’m not alone in having been accused of missing what friends thought were obvious cues or signals. But most folks catch at least a large percentage of them. Maybe our innate love of narrative has some predictive, neurological basis; we have developed the ability to be able to feel where a story might be going. Ditto with a melody. We might sense the emotionally resonant rise and fall of a melody, a repetition, a musical build, and we have expectations, based on experience, about where those actions are leading—expectations that will be confirmed or slightly redirected depending on the composer or performer. As cognitive scientist Daniel Levitin points out, too much confirmation—when something happens exactly as it did before—causes us to get bored and to tune out. Little variations keep us alert, as well as serving to draw attention to musical moments that are critical to the narrative.