An Appetite for Wonder
Page 21
Top: Cricket commentary: Ted Burk and I recording behaviour with microphone and Dawkins Organ. Middle: The Animal Behaviour Research Group after the move from Bevington Road. Marian is far left. I am slightly right of centre. Bottom: A PDP-8 computer like the one that fed my addiction in 13 Bevington Road.
Top: Danny Lehrman (standing) and Niko Tinbergen (right) settling their differences. Bottom: Niko in his element again: will the ash fall from his cigarette before he finishes the shot?
Professor Pringle and (left to right) his colleagues, E. B. Ford, Niko Tinbergen, William Holmes, Peter Brunet, David Nichols.
Deep thought. Above: Bill Hamilton and Robert Trivers wrestling with a problem during Bill’s visit to Harvard; middle left: the endlessly invigorating John Maynard Smith in his beloved garden. Middle right: The Selfish Gene with the original Desmond Morris cover. Bottom left: with the tall, thoughtful, Lincolnesque George Williams. Bottom right: ‘I must have that book!’ Michael Rodgers, K-selected science publisher.
THE GRAMMAR OF BEHAVIOUR
THE Oxford Animal Behaviour Research Group under Tinbergen had long maintained cordial relations with the corresponding sub-department at Cambridge, housed in the neighbouring village of Madingley. ‘Madingley’ was founded in 1950 by W. H. Thorpe – a distinguished scientist whose gently austere, almost ecclesiastical personality is best summed up by Mike Cullen’s jest that it was entirely appropriate that when Thorpe needed a notation for recording birdsong, he transcribed it for the organ. Madingley celebrated its quarter-century in 1975 with a conference in Cambridge organized by Patrick Bateson and Robert Hinde, the leading figures of the Madingley group after Thorpe’s retirement, both of whom later became heads of Cambridge colleges. Many of the speakers at the Madingley conference were past or present members of that group, but they invited some outsiders too, and David McFarland and I were honoured to be the Oxford contingent.
Nowadays, on the rare occasions when I agree to speak at such a conference, I confess that I usually find myself dusting off a previous talk and updating it. Younger and more vigorous in 1974, I took the risk of pushing the boat out and undertaking to write something entirely new for Madingley’s jubilee conference and the book that came out of it. The topic I chose, ‘hierarchical organization’, had a track record in the history of ethology. It was the main theme of one of the boldest – and most criticized – chapters in Tinbergen’s magnum opus The Study of Instinct, the chapter entitled ‘An attempt at a synthesis’. I took a rather different approach – or, rather, several different approaches, and I too attempted a synthesis.
The essence of hierarchical organization, as I interpreted it, is the idea of ‘nested embedment’. I can explain this by contrast with what it is not, and this is where I echo the discussion above about grammar. You might attempt to describe the stream of events – the stream of things an animal does, say – as a Markov Chain. What’s that? I won’t attempt a formal, mathematical definition such as was offered by the Russian mathematician Andrey Markov. An informal, verbal definition is this. A Markov Chain of animal behaviour is a series in which what an animal does now is determined by what it did previously, back a fixed number of steps but no further. In a first-order Markov Chain, what the animal does next can be predicted statistically from its immediately preceding action, and not from anything earlier. Looking at the last but one action (last but two action etc.) gives you no additional predictive power. In a second-order Markov Chain, you improve your ability to predict if you look at the previous two actions, but no further back than that. And so on.
Hierarchically organized behaviour would be very different. Markov Chain analysis, of any order, wouldn’t work. The predictability of behaviour wouldn’t decay smoothly as you look into the future, but would jump up and down in an interesting way – like with the blowfly grooming, but more interestingly than that. In an ideal case, behaviour would be organized in discrete chunks. And chunks within chunks. And chunks within chunks within chunks. That’s what’s meant by nested embedment. The clearest model for nested embedment is syntax, the grammar of human language. Think back to the program that I wrote to generate random grammatical sentences, and the example sentence that I quoted:
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed.
The core sentence is in bold. You can read it and it is grammatically correct, without the various embedded relative clauses or prepositional clauses in the middle. We can build up the embedment as follows. The important point is that the build-up can occur inside the core sentence, or inside already embedded parts of the sentence. Read to yourself the emboldened parts of the following:
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed.
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed.
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed.
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed.
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed.
In every member of the sequence above, you can read the emboldened part on its own, and discover that it is grammatically correct. You can delete the unbold, embedded bits and it might change the meaning but it doesn’t stop the sentence from being grammatically correct.
If, on the contrary, you were to build up the sentence by adding the bits progressively from left to right, none of the series would be grammatical until you hit the end of the whole sentence.
The adjective noun [not a sentence]
The adjective noun of the adjective noun [not a sentence]
The adjective noun of the adjective noun which adverbly adverbly verbed [not a sentence]
The adjective noun of the adjective noun which adverbly adverbly verbed in noun [not a sentence]
The adjective noun of the adjective noun which adverbly adverbly verbed in noun of the noun which verbed adverbly verbed [finally, we have a sentence].
Only in the very last case does the sentence achieve closure and become grammatical. What I wanted to know was whether animal behaviour is organized as a Markov Chain, or whether it is organized in a nested embedded way, perhaps like syntax or perhaps in some other way hierarchically embedded. You can see that there were inklings of the idea lurking behind the research that Marian and I did on drinking in chicks and especially self-grooming in flies. Now, in my Madingley paper, I wanted to look more generally at the question of hierarchical organization, from a theoretical point of view as well as by looking at real studies of animal behaviour.
After defining various kinds of hierarchy in a convenient notation of mathematical logic, I considered possible evolutionary advantages of hierarchical organization. To illustrate what I called the ‘evolutionary rate advantage’ I borrowed from the Nobel Prize-winning economist Herbert Simon a parable of two watchmakers called Tempus and Hora. Their watches kept equally good time, but Tempus took much, much longer to complete a watch. Both kinds of watch had 1,000 components. Hora, the more efficient watchmaker, worked in a hierarchical, modular way. He put his components into 100 sub-assemblies of ten components each. These in turn were assembled into ten larger units, which were finally put together to complete the watch. Tempus, on the other hand, tried to put all 1,000 components together in a single large assembly operation. If he dropped one component, or if the telephone interrupted him, the whole caboodle fell to bits and he had to start again. He very rarely completed a watch, while Hora, with his hierarchical modular technique, was churning them out. The principle will be familiar to all computer programmers, and it surely applies to evolution and to the building of biological systems.
I also extolled another advantage of hierarchical
organization, the ‘local administration advantage’. If you are trying to control an empire from London, or in earlier times from Rome, you cannot micro-manage what happens in remote parts of the empire, because communication channels – in both directions – are too slow. Instead, you appoint local governors, give them broad policy directives, and leave them to take day-to-day decisions on their own. The same necessarily applies to a robot vehicle on Mars. Radio signals take several minutes to travel the distance. If the vehicle encounters a local difficulty, say a boulder, it sends the information back to Earth, and it takes four minutes to get there. ‘Turn left to avoid boulder,’ flashes back the urgent reply, and that again takes four minutes to reach Mars. Meanwhile, the wretched vehicle has long since ploughed into the boulder. Obviously the solution is to delegate local control to an on-board computer, and give the local computer only general policy instructions like: ‘Explore the crater to the north-west, taking care to avoid boulders whenever you encounter them.’ By the same token, if there are several vehicles exploring different parts of Mars, it makes sense for Earth to send general policy instructions to one senior computer on Mars, which sends more detailed instructions coordinating the activities of all its subordinate vehicles, each with its own on-board computer to take fine-grained local decisions. Armies and business corporations use similar hierarchical chains of command, and once again biological systems do the same.
Especially pleasing in this connection are the giant dinosaurs whose very long spinal cord imposed an inconvenient distance between the brain in the head and the seat of much of the action, the giant hind legs. Natural selection solved the problem with a second ‘brain’ (enlarged ganglion) in the pelvis:
Behold the mighty dinosaur,
Famous in prehistoric lore,
Not only for his power and strength
But for his intellectual length.
You will observe by these remains
The creature had two sets of brains–
One in his head (the usual place),
The other at his spinal base.
Thus he could reason ‘A Priori’
As well as ‘A Posteriori’.
No problem bothered him a bit
He made a head and tail of it.
So wise was he, so wise and solemn,
Each thought filled just one spinal column.
If one brain found the pressure strong
It passed a few ideas along.
If something slipped his forward mind
‘Twas rescued by the one behind
And if in error he was caught
He had a saving afterthought.
As he thought twice before he spoke
He had no judgment to revoke.
Thus he could think without congestion
Upon both sides of every question.
Oh, gaze upon this model beast,
Defunct ten million years at least.
Bert Leston Taylor (1866–1921)
‘Thus he could reason “A Priori” / As well as “A Posteriori”‘ – I wish I’d written that. You’d have to look far before you found another poem with quite so many flashes of clever wit in almost every line.
Having established the advantages of hierarchical organization more generally, I moved on to see whether there was evidence of it in specific cases of animal behaviour. Beginning by re-analysing the data Marian and I had recorded from blowflies, I moved on to other data from the animal behaviour literature, which I ferreted out in the library. Among others, I included a large study on the behaviour of damsel fish, another on face-grooming behaviour by mice and another on the courtship behaviour of guppies.
I wanted to devise mathematical techniques for detecting hierarchical embedment, in an attempt at objectivity, unbiased by my own preconceptions. Here’s just one of several computer-based methods I thought up. This one I dubbed Mutual Replaceability Cluster Analysis. My method started by counting frequencies of transitions between behaviour patterns, but then analysed the data in a special hierarchical way. I fed into the computer a table showing how many times each behaviour pattern in the animal’s repertoire was followed by each other one. Then the computer systematically examined the data to see if it could find pairs of behaviour patterns that were mutually replaceable. Mutually replaceable means that you could stick either of them in the place of the other and the overall pattern of transition frequencies would remain the same (or near enough the same, according to some previously defined criterion). Once a mutually replaceable pair had been identified, both members of the pair were renamed with a joint name, and the table of transitions contracted because it now had one fewer rows and one fewer columns. Then the contracted table was fed back into the cluster-analysis program, and the whole thing was repeated as many times as necessary to use up the whole list of behaviour patterns. As each pair of behaviour patterns was swallowed up in a cluster, or as each already swallowed cluster was swallowed up in a bigger cluster, the program moved up one node in a hierarchical tree. Above, for instance, is my Mutual Replaceability tree for the behaviour patterns of guppies, using data from a group of Dutch workers led by Professor G. P. Baerends (who, incidentally, had been Niko Tinbergen’s first graduate student and later became one of the leading figures in European ethology).
The upper diagram shows the transition frequencies of guppy behaviour patterns, as measured by the Dutch scientists. Each circle is labelled with the code name of a behaviour pattern, and the thickness of the lines shows the frequency of transition from one to the other (black lines move from left to right, grey lines from right to left). The lower diagram shows the results of feeding the same data into my Mutual Replaceability Cluster Analysis program. The numbers represent the numerical index of mutual replaceability that I used to compare with the criterion for deciding to unite two entities (actually a rank correlation coefficient, if you happen to be interested). I got similar hierarchical trees for the damsel fish, the mice, Marian’s and my blowflies etc.
Yet another way of thinking about hierarchy, which I used in my Madingley paper, is the hierarchy of goals. A goal is not necessarily a consciously held goal in the animal’s brain (although it might be). I simply meant a condition which brings behaviour to an end. For example, complicated sequences of prey-catching behaviour in a cheetah would be brought to a close by the ‘goal state’ of a successful kill. But goals can be hierarchically embedded within each other, and that is a fruitful way to look at it. I made a distinction between ‘action rules’ and ‘stopping rules’. An action rule tells the animal (or computer in the case of a computer simulation) exactly what to do and when to do it, including lots of conditional instructions (IF . . . THEN . . . ELSE etc). A stopping rule tells the animal (or computer simulation): ‘Behave at random (or try out lots of possibilities) and don’t stop until the following goal state is achieved’ – say, a full stomach.
A pure action-rule program for a complicated task like hunting by a cheetah would become impossibly elaborate. Much better to use stopping rules. But not just one big stopping rule – behave at random until the goal state of full stomach is achieved. Any cheetah living by that rule would die of old age before achieving a square meal! Instead, the sensible way for natural selection to have programmed the behaviour would be with hierarchically embedded stopping rules. The global goal (continue until stomach is full) would ‘call up’ subsidiary goals such as ‘walk around until gazelle sighted’. The goal state ‘gazelle sighted’ would terminate that particular stopping rule and initiate the next one: ‘drop down and crawl slowly towards gazelle’. That would be terminated by the goal state ‘gazelle now within striking distance’. And so on. Each of these subsidiary stopping rules would call up its own, internally embedded stopping rules, each with its own goal state. At much lower levels, even individual muscle contractions often conform to the design that engineers call ‘servo-control’. The nervous system specifies a target state for a muscle, which contracts until the target state (‘stopping rule’) is a
chieved.
But I earlier introduced the idea of hierarchical embedment by using the analogy of human grammar. My Madingley paper finally returned to this fascinating topic, and asked whether there was any evidence that animal behaviour had something equivalent to grammatical structure. If it did, this would be extremely interesting, because it might give us some inklings of the evolutionary antecedents of human language. When true language, with true hierarchical syntax, finally evolved in humans, dare we speculate that it was able to build on a ready-made foundation of pre-existing neural structures that were put in place for different reasons, nothing to do with language, long ago?
The earliest attempt to look at this question was made by my Oxford colleague John Marshall, a linguist. He used courtship behaviour of male pigeons, taking data from the published ethological literature. There were seven ‘words’ of the pigeon lexicon: things like Bow (to the female), Copulate, etc. Marshall used his skills as a linguist to postulate a ‘phrase structure grammar’, just as Chomsky had before him for human language. For my Madingley paper, I translated Marshall’s grammar into the (now largely obsolete) computer language that I favoured at the time, Algol-60. Readers familiar with computer programming will note that, once again, the program is heavily recursive – procedures call themselves, the very essence of hierarchical embedment as I have already explained. In the program, ‘p’ was replaced by ‘If some probability condition, such as 0.3, is met . . .’
At the top of the following diagram is Marshall’s ‘phrase structure grammar’ for pigeon courtship behaviour. In the middle is my Algol-60 translation. And at the bottom are several sequences of ‘behaviour’ generated by my program.