Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man

Home > Other > Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man > Page 8
Harnessed: How Language and Music Mimicked Nature and Transformed Ape to Man Page 8

by Mark Changizi


  In the Beginning

  The Big Bang is the ultimate event, and even it illustrates the typical physical structure of events: it started with a sudden explosion, one whose ringing is still “heard” today as the background microwave radiation permeating all space. Slides didn’t make an appearance in our universe until long after the Bang. As we will see in this section, hits, slides, and rings tend to inhabit different parts of events, with hits and rings—bangs—favoring the early parts.

  To get a feeling for where hits, slides, and rings occur in events, let’s take a look at a simpler event than the one that created the universe. Take a pen and throw it onto a table. What happened? The first thing that happened is that the pen hit the table; the audible event starts with a hit. Might this be a general feature of solid-object physical events? There are fundamental reasons for thinking so, something we discussed in the earlier section, “Nature’s Other Phoneme.” We concluded that whereas hits can occur without a preceding slide, slides do not tend to occur without a preceding hit. Another reason why slides do not tend to start events is that friction turns kinetic energy into heat, decreasing the chance for the slide to initiate much of an event at all. So, while hits can happen at any part of an event, they are most likely to occur at the start. And while slides can also happen anywhere in an event, they are less likely to occur near the start. Note that I am not concluding that slides are more common than hits at the nonstarts. Hits are more common than slides, no matter where one looks within solid-object physical events. I’m only saying that hits are more common at event starts than they are at nonstarts, and that slides are less common at event starts than they are at nonstarts.

  Is this regularity about the kinds of interaction at the starts and nonstarts of events found in spoken language? Yes. Words of the form bas are more common than words of the form sab (where, as earlier, b stands for a plosive, s for a fricative, and a for any number of consecutive sonorants). Figure 9 shows the probability that a non-sonorant is a plosive (rather than a fricative) as one moves from the start of a word to non-sonorants further into the word. The data come from 18 widely varying languages, listed in the legend. One can see that the probability that the non-sonorant phoneme is a plosive begins high at the start of words, after which it falls, matching the pattern expected from physics. And, as anticipated, one can also see that the probability of plosives after the start is still higher than the probability of a fricative.

  Figure 9. This shows how plosives are more probable at the start of words, and fall in probability after the start. The y-axis shows the plosive-to-fricative ratio, and the x-axis the ith non-sonorant in a word. The dotted line is for words with two non-sonorants, and the solid line for words with three non-sonorants. The main points are (i) that plosives are always more probable than fricatives, as seen here because the plosive-to-fricative probability ratios are always greater than 1, and (ii) that the ratio falls after the start of the word, meaning fricatives are disproportionately rare at word starts. These data come from common words (typically about a thousand) from each of the following languages: Japanese, Zulu, Malagasy, Somali, Fijian, Lango, Inuktitut, Bosnian, Spanish, Turkish, English, German, Bengali, Yucatec, Wolof, Tamil, Taino, Haya.

  We just concluded that hits are disproportionately common at the starts of events in nature, and that this feature is also found in language. But we ignored rings. Where in events do rings tend to reside? In the previous section (“Nature’s Syllables”) we discussed the fact that rings do not start events, a phenomenon also reflected in language. How about after the start of a word? There would appear to be a simple answer: rings always occur after physical interactions, and so rings should appear at all spots within events, following each hit or slide.

  But as we will see next, reality is more subtle.

  The First Was a Doozy

  While it is true that all physical interactions cause ringing, the ringing need not be audible, a point that already came up in the section called “Two-Hit Wonder.” In this light, we need to ask, where in events are the rings most audible? Consider the generic pen-on-table event again. The beginning of that event—the audible portion of it, starting when the pen hit the table—is where the greatest energy tends to be, and the ring sound after the first hit will therefore tend to be the loudest. If the pen bounces and hits the table again, the ring sound will be significantly lower in magnitude, and it will be lower still for any further bounces. Because energy tends to get dissipated during the course of an event, rings have a tendency to be louder earlier in the event than later in the event. This is a tendency, but it is not always the case. If energy gets added during the event, ring magnitude can increase. For example, if your pen bounces a couple of times on the table, but then bounces off the table onto the floor, then the floor hit may well be louder than the first table hit (gravity is the energy-adding culprit). Nevertheless, in the generic or typical case, energy will dissipate over the course of a physical event, and thus ringing magnitude will tend to be reduced as an event unfolds. Therefore, the audibility of a ring tends to be higher near the start of an event; or, correspondingly, the probability is higher later in an event that a ring might not be audible.

  If language is instilled with physics, we would accordingly expect that sonorant phonemes are more likely to follow a plosive or fricative near the start of a word, and are more likely to go missing near the end of a word. This is, in fact, the case. Figure 10 shows how the probability of a sonorant following a non-sonorant falls as one moves further into a word, using the same data set mentioned earlier. For example, words like “pact” are not uncommon in English, but words like “ctap” do not exist, and are rare in languages generally.

  Figure 10. This shows that sonorant phonemes are more probable near the starts of words, namely just after the first non-sonorant (usually a plosive). The square data are for words having two non-sonorants, and the triangle data for words having three non-sonorants.

  We have begun to get a grip on how hits, slides, and rings occur within events, but we have only considered their probability as a function of how far into the event they occur. In real events there will be complex dependencies, so that if, say, a slide occurs, it changes the probability of another slide occurring next. In the next section we’ll ask, more generally, which combinations of hits and slides are common and which are rare, and then check for the same patterns in language.

  Nature’s Words

  Rube Goldberg machines excel at producing very long events, all part of a single causal chain. Like most events, Rube Goldberg events are built mostly out of hits, slides, and rings. Again letting b, s, and a stand for hits, slides, and rings, Rube Goldberg events sound something like basabababasababababababasabababasa, although the chains are very often much longer than even this. If events were typically like Rube Goldberg events, then even if spoken words have many of the auditory features found in events, words would be much too short to be event-like. Events are, however, not typically Rube Goldberg-like. Events are, instead, much more typically like a pen thrown on a table, the generic event we discussed in the previous section. Pen-on-table events may consist of a hit, hit, and slide. Or possibly just a hit and a slide. Or even just a lone hit. Most events have just several physical interactions or fewer, much nearer in length to spoken words than to Rube Goldberg events.

  This is what nature-harnessing expects. Spoken words across human languages are not only built out of sounds like those in solid-object physical events, but words tend to have the size of typical physical events. Words tend to sound like events with up to several interaction sounds—plosives or fricatives—not, say, ten. And although words with a single interaction sound are allowed, two or three interaction sounds are more common, again like solid-object physical events.

  Words are not only approximately the size of solid-object physical events—i.e., having several interaction sounds—words also take the amount of time for a typical event. This is something I have thus far ignored. But no
tice that plosives, fricatives, and rings do not just have similar acoustic characteristics to hits, slides, and rings; they also occur over periods of time similar to those typical of hits, slides, and rings. For example, although I described both hits and plosives as nearly instantaneous explosions, the notion of “instantaneous” depends on the time scale relevant to the listener—what’s instantaneous to a human may not be instantaneous to a fly. Hits and plosives are both instantaneous explosions as heard by human ears. This is why plosives sound hitlike; for example, if a hitlike sound were stretched out it would, instead, sound more slidelike (something we discussed in the earlier section called “Hesitant Hits”). Similarly, fricatives and sonorants tend to occur over time scales similar to the slides and rings of physical events. Typical syllables of human speech—e.g., of the form ba or sa—tend to have a duration approximately on the order of tenths of seconds, roughly the same time scale as is common for physical events involving macroscopic objects. In fact, you’ll notice in Figure 4 earlier that the physical and linguistic analogs (e.g., a hit and “k”) are on the same scale for the time (x) axis.

  Words tend to be built out of the constituents of natural solid-object physical events, and to have approximately the size and temporal duration of such events. But are words actually structured like solid-object physical events? Are the natural-sounding phonemes and syllables put together into natural-sounding words? In particular, I’m interested in asking whether the sequences of physical interactions that occur in events—the hits and slides—are similar to the sequences of plosives and fricatives in words. My students and I analyzed the “event structure” of common words across 18 languages, and for each language we measured the distribution of six event types: hit (b), slide (s), hit-hit (bb), hit-slide (bs), slide-hit (sb), and slide-slide (ss). For example, “tea” is a b, “far” is an s, and “faker” is an sb.

  Figure 11. The freqency of the structure types found in words across 18 widely diverse languages (listed in the legend of Figure 9). (Standard error bars shown. See Appendix for details.)

  To estimate how common these simple event types are in nature, students Elizabeth Counterman, Kyle McDonald, and Romann Weber counted the kinds of events occurring in a wide variety of videos. In deciding upon the kinds of videos to sample, we were not especially interested in having videos of, say, the savanna. Recall our discussion in the previous chapter, where we observed that there are “hard cores” of nature likely found in most or all habitats with solid objects crashing about. In choosing twenty videos from which to enumerate solid-object physical events, we simply aimed for a variety of scenarios in which solid-object physical events occur, including cooking, children playing, family gatherings, assembly instructions, and acrobatics. Each student acquired data on the events occurring, and did so using only the visual modality (that is, the videos were on mute); this helped to deal with a worry that our auditory systems are biased by speech so that we hear speechlike structure in events (akin to seeing faces in clouds). The three observers identified an average of 650 events across the 20 videos. Figure 12 shows the average results for the videos as a dotted line, overlaid on the language data from Figure 11. One can see the close similarity in the plots. (Notice that a simple model assuming hits are more common than slides does not explain why bs occurs more often than sb in the language data.)

  Figure 12. The relative frequency of simple event types in videos and in language. One can see their considerable similarity. (Standard error bars shown. See Appendix for details.)

  Again, we find the signature of solid-object physical events—of nature—in spoken language! Our final story in this chapter on speech concerns the sounds of speech above the level of words: the structure of whole phrases and sentences.

  Unresolved Questions

  Earlier in the chapter I remarked on how audition is nature’s more terse modality, only speaking up when there’s an event. In real life, though, there can often be “event overload.” I’m sitting at an airport right now, and I just counted 30 distinct sound events occurring around me over the last 30 seconds. How can we possibly pick out the sounds that matter to us amongst all the noise? There are, in fact, auditory cues that can tell an observer whether an event is relevant to him or her. In particular, these cues can tell the observer that “an event you should pay attention to is coming.”

  The most obvious such auditory cue is loudness. As a sequence of events nears me—be it footsteps, the whir of a whiffle ball, or the siren of a police car—it gets louder. Loudness is also worthy of attention because louder events can sometimes be the more energetic events. The ecological importance of loudness may underlie the role of emphasis in language, the way that more important words or sentences are sometimes spoken more loudly. That louder speech is more important speech is one of those things that is so obvious it is difficult to notice. But its analog in vision is not true—brighter parts of a scene are not the more important parts. Brightness in a scene is usually just a matter of where the sun is, and where it glares off objects. The importance of loudness modulations in speech needs explaining, and the explanation is found in the structure of nature.

  In addition to loudness, events in nature have another sound quality that is even more informative: pitch (the musical, note-like quality of sound). The pitch of an event depends not on how close it is to the observer, but on the rate at which it is getting closer to the observer. To understand why, let’s imagine standing next to a passing train, the standard example used to explain the Doppler effect. The main observation is that the pitch of the train’s whistle starts high and changes to low as it passes. More specifically, note that when the train is far away and approaching, its whistle is at a fixed high pitch, that is, a pitch that is not changing. (It is actually falling, but negligibly and imperceptibly.) The pitch only begins falling audibly when the train is very close to passing you. And shortly after the train has passed you, the pitch has dropped to nearly its low point, so that from then on the pitch stays effectively constant and low. This drop in pitch would apply in any scenario where sequences of events are passing us by. It also occurs any time we are moving past noisy objects. Our auditory systems can sense pitch changes on the order of half a percentage of the sound frequency, sufficient for sensing (if not consciously) the pitch changes due to our walking by a source of sound.

  The important conclusion of these observations is that a typical sequence of events will tend to have this signature falling pitch (unless headed directly toward you). One might speculate that this is why language has a tendency to signal the approaching end of a sentence with a falling intonation—a drop in pitch. That’s what events typically sound like in nature.

  Sequences of events do not always have pitches that fall, however. Pitches can sometimes rise, but special circumstances are required. First, let’s consider what happens if you stand on the railroad tracks rather than beside them. Now the pitch of the train stays the same, right up to the moment that it hits you. Of course, at the instant it hits you, the sound you would be hearing if you were conscious abruptly drops to a lower pitch (because it passes you in a single brain-crushingly short instant), and stays at that pitch as the train moves away. A constant pitch accompanied by increasing loudness is the signature of an impending collision. That same loudness increase, but with a pitch decrease, signals a near miss.

  What could make a pitch increase? Considering the train again, imagine first standing beside the tracks as it approaches, but then walking onto the tracks before it gets there. Because you have moved to a position more directly in the train’s path of motion, the frequency your ears receive from the train will increase as you walk onto the tracks. Alternatively, the pitch would also increase if you stayed off to the side, but the train jumped the tracks and headed toward you at the last moment. A pitch increase is the signature of a sequence of events that is changing its direction in your direction. This is true not only when an approaching sequence of events veers toward you, but also when a re
ceding sequence of events veers so as to begin turning around, perhaps to come back and get you after a miss. An increase in the pitch is, in a sense, more important than loudness. An event might be loud and getting louder, but if its pitch is decreasing, it is not going to hit you. But if an event is not so loud, but has a pitch that is increasing, that means it is aiming itself more toward you (or you are aiming more toward it).

  A rising pitch suggests, then, that the sequence of events is not finished. Events are coming your way. Or, if the sequence of events is moving away from you, then a rising pitch means it is beginning to turn around. This unresolved nature of rising pitches may be the reason why rising pitches in many languages tend to indicate a question. The spoken sentence, “Is that the elephant that stepped on your car?” is a request for further speech. And what better way to sound unresolved than to mimic the sound of nature’s unresolved events?

 

‹ Prev