The Story of Psychology
Page 67
Julesz is proud that his discovery led Hubel and Wiesel and others to turn their attention from form perception to the investigation of binocular vision, but modestly adds:
I never regarded my role of introducing random-dot stereograms into psychology as a great intellectual achievement, despite its many consequences for brain research. It was just a lucky coincidence, a clash between two cultures, an association between two foreign languages (that of the psychologist and the engineer) in the head of a bilingual.67
Yet another theory about depth perception was proposed several decades ago—one that was neither specifically neural nor specifically cognitive. Not that its proponent tactfully combined the two; on the contrary, he virtually ignored the neural theory and dismissed the cognitive theories as unnecessary and based on wrong assumptions.
Only a thoroughgoing maverick would reject a century’s worth of depth-perception research and claim to have found a totally different and correct approach. Only a true nonconformist would assert that we perceive depth neither by neural detection nor inference from cues but “directly” and automatically. Only a brash individualist would present a radical epistemology in which the physics of light is said to give us an accurate, literal experience of depth and that we need not interpret what we see because we see what is as it is.
Such a one was the late James J. Gibson (1904–1979), whose admirers considered him “the most important student of visual perception of the twentieth century” and “the most original theoretician in the world in the psychology of perception,” but whose theory is considered by the majority of perception specialists “extremely implausible” (one reviewer even called it too “silly” to merit discussion) and has few advocates.68
Born in a river town in Ohio and reared in various parts of the Midwest, Gibson was the son of a railroad surveyor.69 He went to Princeton University but felt out of place in a social world that revolved around clubs, and preferred to associate with what he called “the eccentrics.” For a while he vacillated between philosophy and acting (he was wavy-haired, square-jawed, and good-looking enough for leading roles), but in his senior year he took a course in psychology and at once heard the call. In 1928, he received a faculty appointment at Smith, where for some years, he was interested in relatively traditional perception research. Then, during World War II, he was asked by the Army Air Corps’s Aviation Psychology Program to develop depth-perception tests for determining who had the visual aptitudes needed for flying, particularly for making successful take-offs and landings.
He considered the classical cues to depth perception, including shadows and perspective, of little worth. In his opinion they were based on paintings and parlor stereoscopes rather than on three-dimensional reality, and on static images rather than on movement. What seemed to him much more useful and realistic were two other kinds of cues: texture gradient, like the uniformly changing roughness of the runway as seen by a pilot during the final leg of an approach; and motion perspective, or the flow of changing relationships among objects as one moves through the environment, including all that a pilot sees during take-offs and landings.70 These cues soon became, and are today, accepted components of the cue-based theory of depth perception.
Gibson’s Air Corps work held the germ of his later view. The crucial mechanism in depth perception (in all perception, according to Gibson) is not the retinal image, with all its cues, but the changing flow of relationships among objects and their surfaces in the environment that the perceiver moves through. During the 1950s and 1960s, he did a considerable amount of research at Cornell that tested his belief in texture gradients. In some experiments he placed diffusing milk-glass between an observer and textured surfaces; in others he dilated the observer’s eyes to prevent sharp focus on texture; in still others he cut Ping-Pong balls in half and made goggles of them so that what his subjects saw was foglike, without surfaces or volume.71 From these and other experiments, plus a careful consideration of his research on air-crew testing and training, Gibson came to reject texture gradients and to stress movement by the observer through the environment as the key to depth perception. However large or small the movement, it results in changes in the optic array—the structured pattern of light reaching the eye from the environment—such as is suggested in this drawing:
FIGURE 35
How optic array conveys depth
The optic array, rich in information as seen from any point, becomes infinitely richer with movement by the observer. Even minor movements of the head change the array, transforming what is seen of an object and the relationships among objects, and yielding optic flow of one kind or another. Gibson came to believe that optic array and flow convey depth and distance directly, without the need of mental calculation or inference from cues.72
This is how Gibson explained depth perception in his sweeping “ecological” theory of “direct perception.” The pity is that he felt obliged to throw out the baby with the bathwater. For it is possible to acknowledge both the neural and cognitive views of depth perception as correctly explaining different aspects of the phenomenon and the Gibsonian view as supplementary to them. But it wasn’t possible for James J. Gibson.
His name and theory have faded from view, but the cues he was so enamored of have remained accepted components of contemporary accounts of depth perception.
Two Ways of Looking at Vision
“Visual perception,” Bela Julesz said fifteen years ago, “is in the same state as physics was prior to Galileo or biochemistry was prior to the discovery of the double helix by Watson and Crick.”73 Since then, a good deal more has been learned, and yet it remains true that each of the two major approaches—the neural and the cognitive—explains only some of the phenomena; there is not yet a comprehensive and unifying theory of visual perception. Perhaps some great organizing concept remains undiscovered, or perhaps visual perception is so complex that no one theory can embrace all of its concepts and that the two different approaches deal with events occurring at radically different levels of complexity.
We have seen something of each of these approaches. Here, to round out the picture, are brief sketches of how each explains visual perception in general.
The neural approach answers questions that preoccupied nineteenth-century physiologists: How can sensory nerves, though alike in structure, transmit different sensations to the brain? And how does the brain turn those incoming impulses into vision?
The answer, worked out in great detail over recent years,74 is that the nerve impulses themselves do not differ; rather, receptors that respond to specific stimuli send their messages separately to the striate or primary area of the visual cortex. The process begins on the retina, where rods are sensitive to low levels of illumination, cones to more intense levels; cones are of three types, each responsive to different wavelengths of visible light, and some, as we have already heard, sensitive to special shapes and motions.
From the rods and cones, the same kinds of nerve impulses travel along parallel pathways but end up in different areas of the brain—more than 90 percent of them in particular parts of the primary visual cortex and 10 percent in other subcortical structures. Thus the messages delivered to the brain have been analytically separated into color, shape, movement, and depth, and delivered to specialized receptive areas. By means of staining techniques that trace the neuron pathways in laboratory monkeys from retina to visual cortex, researchers have been able to identify more than thirty such distinct cortical visual areas.
What happens then? The brain puts it all together: Using single-cell recordings and two kinds of brain scans (PET, positron emission tomography, and fMRI, functional magnetic resonance imaging), perception researchers have puzzled out the extremely intricate architecture of the primary visual cortex and its wiring scheme (far too complex to take up here), which integrates the individual impulses and blends the information from the two eyes. The result is that the image cast on the retina winds up as the excitation of groups of complex neurons, b
ut the pattern of these excitations in no way resembles the image on the retina or the scene outside the eye. Rather, as already mentioned, it is analogous to writing about a scene, which conveys what it consists of but does not in the least look like it.
It is not an image but a coded representation of the image, somewhat as the patterns of magnetism on a tape recording are not sounds but a coded representation of sounds. The representation, however, is not yet a perception; the primary visual cortex is in no sense the end of the visual path. It is just one stage in the processing of the information it handles.
From the striate region the partly assembled and integrated information is sent to other areas of the visual cortex and to higher areas of brain cortex beyond it. There, the information is finally seen by the mind and recognized as something familiar or something not seen before. How that takes place is still moot, according to most neuroscientists. A few, however, boldly guess that somewhere at the higher brain levels are cells that contain “traces” of previously seen objects in the form of synaptic connections or molecular deposits, and these cells respond when an incoming message matches the trace. The response to a match is an awareness (“I know that face”); a nonmatch produces no response, which is also an awareness (“I don’t know that face”).75
The neural approach tells us much about the workings of visual perception at the micro level but little at the macro level, much about the machinery of vision but little about its owner and operator, much about neuronal responses but little about the experience of perception. As one cognitive theorist put it, “Trying to understand perception by studying only neurons is like trying to understand bird flight by studying only feathers.”76
The cognitive approach deals with the mental processes at work in such perceptual phenomena as shape constancy, feature identification, form recognition, cue-derived depth perception, recognition of figures when much of the information is missing, and so on.
The mental processes that yield these results are made up of billions of neuronal events, but cognitive theorists say that it takes macrotheories, not microtheories, to explain these processes. A physicist studying how and when a wave changes form and breaks as it nears the shore cannot derive the laws of wave mechanics from the interactions of trillions of water molecules, not even with a number-crunching mainframe computer. Those laws express mass effects that exist at a wholly different level of organization. The sounds made by a person talking to us are made up of vibrations of the molecules of atmospheric gases, but the meaning of the words cannot be explained in those terms.
So too with the mental processes of visual perception; they are organized mass effects of neural phenomena expressed by mental, not neurophysiological, laws. We have already seen evidence of this, but there is one particularly intriguing and historic example worth discussing. What happens, and at what level, when we call up an image from memory and see it in the mind’s eye? Experiments by cognitive theorists show that this can be explained only in high-level cognitive terms. The most elegant and impressive of such experiments are those of Roger Shepard (now emeritus) of Stanford University on “mental rotation.” Shepard asked subjects to say whether the objects in each of these three pairs are identical:
FIGURE 36
Mental rotation: Which pairs are identical?
Most people recognize, after studying them for a little while, that the objects in A are identical as are those in B. Those in C are not. When asked how they reached their conclusions, they say that they rotated the objects in their minds much as if they were rotating real objects in the real world. Shepard demonstrated how closely this procedure mirrors real rotation by another experiment, in which viewers saw a given shape in degrees of angular difference. This set, for example, shows a single shape in a series of positions:
FIGURE 37
Mental rotation: The greater the distance, the longer it takes.
When subjects were shown pairs of these figures, the time it took them to identify them as the same was proportional to the angular difference in the positions of the figures; that is, the more one figure had to be rotated to match the other, the longer it took for identification.77
This is only one of many perceptual phenomena that involve higher mental processes operating on internalized symbols of the external world. For some years a number of perception researchers have been trying to formulate a comprehensive cognitive theory of what those processes are and how they produce those perceptions.
There are two schools of thought about how to do this. One uses concepts and procedures drawn from artificial intelligence (AI), a branch of computer science. The basic assumption of AI is that human mental activities can be simulated by step-by-step computer programs—and take place in that same step-by-step programmed way.78 Partly in the effort to make computers recognize what they are looking at, and partly to gain a better understanding of human perception, AI experts have written a number of form-recognition programs. To achieve elementary form recognition—to recognize triangles, squares, and other regular polygons, for instance—a program might follow a series of if-then steps. If there is a straight line, then follow it and measure it to its end; if another line continues from there, then call that point a corner and measure the angle by which it changes direction; if that other line is straight, follow it until… and so on, until the number of sides and angles has been counted and matched against a list of polygons and their characteristics.
The chief argument in favor of the AI approach to visual perception is that there is no projector or screen in the brain and no homunculus looking at pictures; hence the mind must be dealing not with images but coded data that it processes step by step, as a computer program does.
Fifteen years ago the chief argument against the AI idea was that no existing program of machine vision had more than a minuscule capacity, compared with that of human beings, to recognize flat shapes, let alone three-dimensional ones, or to know where they are within the environment, or to recognize the probable physical qualities of the rocks, chairs, water, bread, or people it was seeing. But since then there have been extraordinary developments in machine vision. Formerly limited to two-dimensional representation, it is now capable of 3-D, and methods of identifying shapes and distances have greatly improved. Robots guided by machine vision now run operations in a great many factories; AI systems using machine vision have guided driverless automobiles across the desert, avoiding obstacles and ravines; security systems can now match a seen face to a photograph of that face, and so on.
Having said all that, it remains true that machine vision has only a very limited capacity, compared with that of human beings, to recognize all sorts of objects for what they are; it doesn’t understand, it doesn’t know, it doesn’t feel. Basically, that’s because it isn’t hooked up to the immense information base of the human mind: its vast store of mental and emotional responses built in by evolution, its immense accumulation of learned meanings of perceptions, its huge compilation of interconnected information about the world. As remarkable as the achievements of the designers of machine vision are, their work has led to a greater understanding of how to make machine vision work but not to a deeper understanding of how human vision works.
The other school of thought about how cognitive perceptual processes work has long relied and continues to rely on laboratory studies of human thinking rather than machine simulations of thinking. This view, going far beyond the Helmholtz tradition that perception is the result of unconscious inference from incomplete information, includes conscious thought processes of other kinds. Its leading exponent in recent years was Irvin Rock (1922–1995) of the University of California at Berkeley. His book, The Logic of Perception, was described in the Annual Review of Psychology as “the most inclusive and empirically plausible explanation of perceptual effects that seem to require intelligent activity on the part of the perceiver.”79
Rock, though an outstanding perception psychologist, was far from outstanding in his early undergraduate ye
ars; in fact, in an intellectual family he was the black sheep. But during World War II his unit was dive-bombed by enemy planes, he felt sure he would be killed, and “I vowed to myself,” he said, “that if I survived I would try to do more with my life than I had until then.”80 After the war he became a top-notch student. He began graduate school in physics but switched to psychology when he realized that there was greater opportunity in that young field for a significant contribution to knowledge.
At the New School for Social Research Rock fell under the spell of the Gestaltists who were there and became an ardent one himself. Certain basic Gestalt laws of organization and relational thinking are still part of his theory. But those laws describe essentially automatic processes, and Rock came to believe that many perceptual phenomena could be accounted for only by mental processes of a thoughtlike character.81
This idea first occurred to him when he conducted the 1957 experiment, described above, in which he tilted a square so that it looked like a diamond, then tilted the perceiver. Since the perceiver still saw the square as a diamond, Rock reasoned that he must have used visual and visceral cues to interpret what he saw. Rock spent many years devising and conducting other experiments to test the hypothesis that, more often than not, perception requires higher-level processes than those taking place in the visual cortex. These studies led him, finally, to the thesis that “perception is intelligent in that it is based on operations similar to those that characterize thought.”82
And indeed, Rock has said, perception may have led to thought; it may be the evolutionary link between low-level sensory processes in primitive organisms and high-level cognitive processes in more complex forms of life. If what the eye sees, he argues, is an ambiguous and distortion-prone representation of reality, some mechanism had to evolve to yield reliable and faithful knowledge of that reality. In his words, “Intelligent operations may have evolved in the service of perception.”83