How Language Began

Home > Other > How Language Began > Page 24
How Language Began Page 24

by Daniel L. Everett


  This perceptual and articulatory organisation of the syllable brings duality of patterning to language naturally. By organising sounds so that they can be more easily heard, languages in effect get this patterning for free. Each margin and nucleus of a syllable is part of the horizontal organisation of the syllable while the sounds that can go into the margin or nucleus are the fillers. And that means that the syllable in a sense could have been the key to grammar and more complex languages. Again, the syllable is based upon a simple idea: ‘chunk sounds so as to make them easier to hear and to remember’. Homo erectus would have probably have had syllables because they just follow from the shortness of our short-term memory in conjunction with the easiest arrangements to hear. If so, then this means that erectus would have had grammar practically given to them on a platter as soon as they used syllables. It is, of course, possible that syllables came later in language evolution than, say, words and sentences, but any type of sound organisation, whether in phonemes of Homo sapiens or other sounds by erectus or neanderthalensis, would entail a more powerful form for organising a language and moving it beyond mere symbols to some form of grammar. Thus early speech would have been a stimulus to syntagmatic and paradigmatic organisation in syntax, morphology and elsewhere in language. In fact, some other animals, such as cotton-top tamarins, are also claimed to have syllables. Whatever tamarins can do, the bet is that erectus could do better. If tamarins had better-equipped brains then they’d be on their way to human language.

  If this is on the right track, duality of patterning, along with gestures and intonation, are the foundational organisational principles of language. However, once these elements are in place, it would be expected that languages would discover the utility of hierarchy, which computer scientists and psychologists have argued to be ever useful in the transmission or storage of complex information.

  Phonology is, like all other forms of human behaviour, constrained by the memory–expression tension: the more units that a language has, the less ambiguously it can express messages, but the more there is to learn and memorise. So if a language had three hundred speech sounds, it could produce words that were less ambiguous than a language with only five speech sounds. But this has a cost – the language with more speech sounds is harder to learn. Phonology organises sounds to make them easier to perceive, adding a few local cultural modifications preferred by a particular community (as English ‘strength’ instead of ‘tsrength’). This gets to the co-evolution of the articulatory and auditory apparatuses. The relationship between humans’ ears and their mouths is what accounts for the sounds of all human languages. It is what makes human speech sounds different than, say, Martian speech sounds.

  The articulatory apparatus of humans is, of course, also interesting because no single part of it – other than its shape – is specialised for speech. As we have seen, the human vocal apparatus has three basic components – moving parts (the articulators), stationary parts (the points of articulation), and airflow generators. It is worth underscoring again the fact that the evolution of the vocal apparatus for speech is likely to have followed the beginning of language. Though language can exist without well-developed speech abilities (many modern languages can be whistled, hummed, or signed), there can be no speech without language. Neanderthalensis did not have speech capabilities like those of sapiens. But they most certainly could have had a working language without a sapiens vocal apparatus. The inability of neanderthalensis to produce /i/, /a/ and /u/ (at least according to Philip Lieberman) would be a handicap for speech, but these ‘cardinal’ or ‘quantal’ vowels are neither necessary nor sufficient for language (not necessary because of signed languages, not sufficient because parrots can produce them).

  Speech is enhanced as has been mentioned, when the auditory system co-evolves with the articulatory system. This just means that human ears and mouths evolve together. Humans therefore get better at hearing the sounds their mouths most easily make and making the sounds their ears most easily perceive.

  Individual speech sounds, phones, are produced by the articulators – tongue and lips for the most part – meeting or approximating points of articulation – alveolar ridge, teeth, palate, lips and so on. Some of these sounds are louder because they offer minimal impedance to the flow of air out of the mouth (and for many out of the nose). These are vowels. No articulator makes direct contact with a point of articulation in the production of vowels. Other phones completely or partially impede the flow of air out of the mouth. These are consonants. With both consonants and vowels the stream of sounds produced by any speaker can be organised so as to maximise both information rate (consonants generally carry more information than vowels, since there are more of them) and perceptual clarity (consonants are easier to perceive in different positions of the speech stream, such as immediately preceding and following vowels and the beginnings and ends of words). Vowels and consonants, since speech is not digital in its production, but rather a continuous stream of articulatory movements, ‘assimilate’ to one another, that is they become more alike in some contexts, though not always the same contexts in every language. If a native speaker of English utters the word ‘clock’, the ‘k’ at the end is pronounced further back in the mouth than it is when they pronounce the word ‘click’. This is because the vowel ‘o’ is further back in the mouth and the vowel ‘i’ is further to the front of the mouth. In these cases, the vowel ‘pulls’ the consonant towards its place of articulation. Additional modifications to sounds enhance perception of speech sounds. Another example is known as aspiration – a puff of air made when producing a sound. Or voicing – what happens as the vocal cords vibrate while producing a sound. Syllable structure is another modification, when sounds are pronounced differently in different positions of a syllable. This is seen in the pronunciation of ‘l’ when it is at the end of a syllable, as in the word ‘bull’, vs ‘l’ when it is at the beginning of a syllable as in ‘leaf’. To literally see aspiration, hold a piece of paper about one inch in front of your mouth and pronounce the word ‘paper’. The paper will move. Now do the same while pronouncing the word ‘spa’. The paper will not move on the ‘p’ of ‘spa’ if one is a native speaker of English.

  These enhancements are ignored often by native speakers when they produce speech because such enhancements are simply ‘add-ons’ and not part of the target sound. This is why native speakers of English do not normally hear a difference between the [p] of ‘spa’ and the [ph] of ‘paper’, where the raised ‘h’ following a consonant indicates aspiration. But to the linguist, these sounds are quite distinct. Speakers are unaware of such enhancements and usually can learn to hear them only with special effort. The study of the physical properties of sounds, regardless of the speakers’ perceptions and organisation of them, is phonetics. The study of the emic knowledge of speakers, what enhancements are ignored by native speakers and what sounds they target, is phonology.

  Continuing on with our study of phonology, there is a long tradition that breaks basic sounds, vowels and consonants, down further into phonetic features, [+/− voiced] ‘voiced vs non-voiced’ or [+/− advanced tongue root], as in the contrast between the English vowels /i/ of beet and the vowel i of [bit] and so on. But no harm is done to the exposition of language evolution if such finer details are ignored.

  Moving up the phonological hierarchy we arrive once again at the syllable ‘the’, which introduces duality of patterning into speech sound organisation. To elaborate slightly further on what was said earlier about the syllable, consider the syllables in Figure 27.

  As per the earlier discussion of sonority, it is expected that the syllable [sat] will be well formed, ceteris paribus, while the syllable [lbad] will not be because in the latter the sounds are harder to perceive.

  The syllable is thus a hierarchical, non-recursive structuring of speech sounds. It functions to enhance the perceptibility of phones and often works in languages as the basic rhythmic unit. Once again, one can imagine that, given
their extremely useful contributions to speech perception, syllables began to appear early on in the linking of sounds to meaning in language. They would have been a very useful and easy addition to speech, dramatically improving the perceptibility of speech sounds. The natural limitations of human auditory and articulatory systems would have exerted pressure for speakers to hear and produce syllabic organisation early on.

  Figure 27: Syllables and sonority

  However, once introduced, syllables, segments and other units of the phonological hierarchy would have undergone culture-based elaborations. Changes, in other words, are made to satisfy local preferences, without regard for ease of pronunciation or production. These elaborations are useful for group identification, as well as the perceptions of sounds in certain places in the words. So sometimes they are motivated by ease of hearing or pronunciation or by cultural reasons, in order to make sounds that identified one group as the source of those sounds, because speakers of one culture would have preferred some sounds to others, some enhancements to others and so on. A particular language’s inventory of sounds might also be culturally limited. This all means that in the history of languages, a set of cultural preferences emerges that selects among the sounds that humans can produce and perceive to choose the sounds that a particular culture at a particular time in the development of the language chooses to use. After this selection, the preferred sounds and patterns will change over time, subject to new articulatory, auditory, or cultural pressures, or via contact with other languages.

  Other units of the phonological hierarchy include phonological phrases, groupings of syllables into phonological words or units larger than words. These phrases or words are also forms of chunking to aid working memory and to facilitate faster interpretation of information communicated. This segmentation is aided by gestures and intonation which offer backup help to speech for perception and working memory. In this way, the grouping of smaller linguistic units (such as sounds) into larger linguistic units (such as syllables and words and phrases) facilitates communication. Phrases and words are themselves grouped into larger groupings some linguists have referred to as ‘contours’ or ‘breath groups’ – groupings of sounds marked by breathing and intonation. We have mentioned that pitch, loudness and the lengthening or shortening of some words or phrases can be used to distinguish, say, new information from old information, such as the topic we are currently discussing (old information) and a comment about that topic (new information). These can also be used to signal emphasis and other nuances that the speaker would like the hearer to pick up on about what is being communicated. All of these uses of phonology emerge gradually as humans move from indexes to grammar. And every step of the way they would have in all likelihood been accompanied by gesture.

  From these baby steps an entire ‘phonological hierarchy’ is constructed. This hierarchy entails that most elements of the sound structure of a given language are composed of smaller elements. In other words, each unit of sound is built up from smaller units via natural processes that make it easier to hear and produce the sounds of a given language. The standard linguistic view of the phonological hierarchy is given in Figure 28.

  Our sound structures are also constrained by two other sets of factors. The first is the environment. Sound structures can be significantly constrained by the environmental conditions in which the language arose – average temperatures, humidity, atmospheric pressure and so on. Linguists missed these connections for most of the history of language, though more recent research has now established them clearly. Thus, to understand the evolution of a specific language, one must know something about both its original culture and its ecological circumstances. No language is an island.

  Several researchers summarise these generalisations with the proposal that the first utterances of humans were ‘holophrastic’. That is, the first attempts at communication were unstructured utterances that were neither words nor sentences, simply interjections or exclamations. If an erectus used the same expression over and over to refer to a sabre-toothed cat, say, how might or would that symbol be decomposed into smaller pieces? Gestures, with functions that overlap intonation in some ways, contribute to decomposing a larger unit into smaller constituents, either reinforcing already highlighted portions or by indicating that other portions of the utterance are of secondary importance, but still more than portions of tertiary importance, and so on. To see how this might work, imagine that an erectus woman saw a large cat run by her and she exclaimed, ‘Shamalamadingdong!’ One of those syllables or portions of that utterance might be louder or higher pitched than the others. If she were emotional, that would necessarily come through in gestures and pitches that would intentionally or inadvertently highlight different portions of that utterance. Perhaps such as ‘SHAMAlama­dingDONG!’ or ‘ShamaLAMAdingdong’ or ‘ShamalamaDINGdong’ or ‘SHAMAlamaDINGdong’, and so forth. If her gestures, loudness, pitch (high or low or medium) lined up with the same syllables, then those would perhaps begin to get recognition as parts of a word or sentence that began without any parts.

  Figure 28: Phonological hierarchy

  Prosody (pitch, loudness, length), gestures and other markers of salience (body positioning, eyebrow raising, etc.) have the joint effect of beginning to decompose the utterance, breaking it down into parts according to their pitch or gestures. Once utterances are decomposed, and only then, they can be (re)composed (synthesised), to build additional utterances. And this leads to another necessary property of human language: semantic compositionality. This is crucial for all languages. It is the ability to encode or decode a meaning of a whole utterance from the individual meanings of its parts.

  Thus from natural processes linking sound and meaning in utterances, there is an easy path via gestures, intonation, duration and amplitude to decomposing an initially unstructured whole into parts and from there recomposing parts into wholes. And this is the birth of all grammars. No special genes required.

  Kenneth Pike placed morphology and syntax together in another hierarchy, worth repeating here (though I use my own, slightly adapted version). This he called the ‘morphosyntactic hierarchy’ – the building up of conversations from smaller and smaller parts.

  How innovations, linguistic or otherwise, spread and become part of a language is a puzzle known as the ‘actuation problem’. Just as with the spread of new words or expressions or jokes today, several possible enabling factors might be involved in the origin and spread of linguistic innovations. Speakers might like the sounds of certain components of novel erectus utterances more than others, or an accompanying pitch and/or gesture might have highlighted one portion of the utterance to the exclusion of other parts. As highlighting is also picked up by others and begins to circulate, for whatever reasons, the highlighted, more salient portion becomes more important in the transmission and perception of the utterance being ‘actuated’.

  It is likely that the first utterance was made to communicate to someone else. Of course, there are no witnesses. Nevertheless the prior and subsequent history of the language progression strongly support this. Language is about communication. The possibly clearer thinking that comes when we can think in speech instead of merely, say, in pictures, is a by-product of language. It is not itself what language is about.

  Figure 29: Morphosyntactic hierarchy

  Just as there is no need to appeal to genes for syntax, the evidence suggests that neither are sound structures innate, aside from the innate vocal apparatus–auditory perception link between what sounds people can best produce and hear. The simplest hypothesis is that the co-evolution of the vocal apparatus, the hearing apparatus and linguistic organisational principles led to the existence of a well-organised sound-based system of forms for representing meanings as part of signs. There are external, functional and ecological constraints on the evolution of sound systems.¶

  Syntax develops with duality of patterning and additions to it that are based on cultural communicational objectives and convent
ions, along with different grammatical strategies. This means that one can add recursion if it is a culturally beneficial strategy. One can have relative clauses or not. One can have conjoined noun phrases or not. Some examples of different grammar strategies in English include sentences like:

  John and Mary went to town (a complex, conjoined noun phrase) vs John went to town. Mary went to town (two simple sentences).

  The man is tall. The man is here (two simple sentences). vs The man who is tall is here (a complex sentence with a relative clause).

  Morphology is the scientific term for how words are constructed. Different languages use different word-building strategies, though the set of workable strategies is small. Thus in English there are at most five different forms for any verb: sing, sang, sung, singing, sings. Some verbs have even fewer forms: hit, hitting, hits. There are really only a few choices for building morphological (word) structures. Words can be simple or composed of parts (called morphemes). If they are simple, without internal divisions, this is an ‘isolating’ language. Chinese is one illustration. In Chinese there is usually no past tense form for a verb. A separate word is necessary to indicate past tense (or just the context). So where we would say in English, ‘I ran’ (past tense) vs ‘I run’ (present tense), in Chinese you might say, ‘I now run’ (three words) vs ‘I yesterday run’ (three words).

  In another language, such as Portuguese, the strategy for building words is different. Like many Romance languages (those descended from Latin), Portuguese words can combine several meanings. A simple example can be constructed from falo, which means ‘I speak’ in Portuguese.

  The ‘o’ ending of the verb means several things at once. It indicates first person singular, ‘I’. But it also means ‘present tense’. And it also means ‘indicative mood’ (which in turn means, very roughly, ‘this really is happening’). The ‘o’ of falo also means that the verb is of the -ar set of verbs (falar ‘to speak’, quebrar ‘to break’, olhar ‘to look’, and many, many others). Portuguese and other languages descended from Latin, such as Spanish, Romanian, Italian, are known as Romance languages. They are referred to linguistically as ‘inflectional’ languages.

 

‹ Prev