by Morton Hunt
This hypothesis was widely accepted in the 1970s and the early 1980s. But some leading investigators began to doubt that the icon, observed only under unnatural laboratory conditions, exists in normal perception; if it does not, the saccadic-iconic hypothesis of movement perception collapses. As Ralph Haber saw it:
Such presentations have no counterpart in nature, unless you are trying to read during a lightning storm. There are no natural occasions in which the retina is statically stimulated for less than about a quarter of a second, preceded and followed by blankness… There is never a fixed snapshot-like retinal image, frozen in time, but rather a continuously changing one… The icon was born in the laboratory, and it has life only there and nowhere else.47
The screen of the eye is not a photographic emulsion, and moving images on it are not captured, unblurred, in the form of stills. Rather, the retina is a tissue made up of millions of receptors, each of which, when stimulated, fires many times per second. As an image moves across the retina, a continuous flow of impulses from a series of receptors proceeds to the visual cortex. There is no blur, because the system generates not a series of stills but an unbroken stream of changing information.
And indeed, a dramatic discovery about movement perception, made four decades ago, was that some neurons in the retina and in the visual cortex fire in response to movement but that many others do not; the detection of movement begins at the single-cell level. This ancient evolutionary development helps prey avoid being eaten, and also helps predators locate and seize their prey. A frog will efficiently snap at any small moving object but starve to death if presented only with dead flies or worms, which it does not perceive as food;48 many other simple predators show similar behavior. The frog’s retina and brain apparently have neurons that respond to movement (and size), a capacity that has more survival value than other aspects of vision.
In the 1960s and 1970s, Hubel and Wiesel, among others, demonstrated the existence of movement detectors. They showed, when recording the activity of single cells in cats and monkeys by the microelectrode technique, that in both the retina and the visual cortex certain cells, and only these, respond strongly to movement. Some, in fact, fire in response only to movement in a particular direction, others only to movement in the opposite direction.49
Other investigators confirmed this by entirely different methods. In 1963 Robert Sekuler and a colleague projected an image of a grate moving upward, established the threshold (minimum speed) at which each human subject could see it moving, and then had each one look steadily at the moving image. After several minutes, subjects could no longer see the movement when the grate was crawling at the original threshold speed, although they could still do so when the speed was doubled, and could see downward motion at the slower speed. The results indicated that there were upward-motion detectors, which had become fatigued, and downward-motion detectors, which had not. Comparable results were obtained in reverse when the subjects watched a downward-moving grating for several minutes.50
Most of us have experienced movement-detector fatigue without knowing its neural basis. If we look steadily for some time at a waterfall (or another continuously moving stream, like an assembly line) and then look away, we see illusory movement in the opposite direction. “This is called the waterfall effect,” note Gazanniga and Heatherton, “because if you stare at a waterfall and then turn away, the rocks and trees will seem to move upward for a moment.”51 The cells in the visual cortex that fire at a high rate in response to movement in one direction become temporarily fatigued and cease firing, while those that fire in response to movement in the opposite direction continue to do so at their normal low level, temporarily producing a sense of movement in their preferred direction.
None of this, however, explains two other mysteries of motion perception. If we move our eyes or head to follow a flying bird or other moving object, we perceive movement even though the image remains at the center of the retina. Conversely, if we move our eyes, images sweep across the retina but we see the world as still.
There must, then, be some other source of information that confirms or corrects the information coming from the retina. Two possibilities have long been put forward: Either the brain’s commands to the eyes and head to move, in order to keep the image of a moving object on the center of the retina, or the eye and head movements themselves are relayed to the visual cortex and there interpreted as the object’s movement. Similarly, when we scan a still background, either the brain’s commands or the eye and head movements send signals to the visual cortex that enable it to recognize the moving retinal image as that of an unmoving scene.52
The matter has not been resolved; laboratory experiments with animals provide some evidence for each theory. By one means or another, eye and head movements provide part of the information essential to movement perception. Studies of afterimages prove the point. If subjects stare at a bright light for a little while, then look away toward a relatively dark area, they see an afterimage of the light. If they move their eyes, the afterimage moves in the same direction, although the source of the afterimage, the fatigued area of the retina, does not move. This means that the visual cortex, receiving messages that the eyes are moving but that the image is not moving across the retina, interprets them to mean that the eyes are tracking a moving image.53
Another possible explanation is the frame of reference effect. If you are looking at some background, such as the tennis court across the net, your opponent starts to move cross-court, and you turn your head to keep your eyes on him, the result is an array of images across your retinas. But you know that the other player moved, not the tennis court, which had been established in your brain as the frame of reference.54
Much recent neuroscientific research has investigated the neural pathways involved in various forms of disorders of motion perception. Thus far, it has neither confirmed nor amended the above hypotheses, but may well be on the brink of doing so.
Seeing Depth
In nature, unlike the laboratory, neither form nor movement exists apart from three-dimensionality; to understand form and motion perception in everyday life, it is essential to understand depth perception.55 Psychologists have always considered this a central puzzle about perception; a bibliography of all their writings on depth perception would fill more than a volume.56
The basic question has always been both obvious and simple: How do we see the world as three-dimensional when our source of information, the image on the retina, is essentially two-dimensional? Why do we not see the world as flat, like a color photograph in which distance and the three-dimensional qualities of every object are merely suggested by size, perspective, shading, and other cues?
Such cues are, in fact, the answer offered by a group of theories. These take many forms but all hold that depth perception is not automatic and innate. Some say that it comes about as a result of experiences that lead us to associate depth with the cues; others that it is the product of learned mental processes by means of which we infer depth from the cues.
The argument that depth perception is the product of our associating cues with our experiences of depth began with Locke and Berkeley. From their time to the present, psychologists in the associationist behaviorist tradition have maintained that unconsciously or consciously we link the cues in the two-dimensional retinal image with our experiences of how far away the objects are that produce those cues.
The alternative notion, that we perceive depth as a result of a kind of logical reasoning about what we see, was first voiced in 1843 by J. S. Mill, who said of perception that what we observe is one-tenth observation and nine-tenths inference. Later in the century, Helmholtz argued, in more detail, that we unconsciously infer three-dimensional reality from the two-dimensional retinal data. From then until now, a number of cognitively oriented psychologists have held that perception, including that of depth, is partially or even largely a product of higher mental functions—“thoughtlike processes,” Irvin Rock terms them—of
which inference from cues is only one.57
Whichever view one prefers, the cues to depth are familiar enough in everyday life, and their role in perception has been demonstrated in many hundreds of experiments. Here are the principal cues and a few representative experiments:58
—Apparent size: The farther away any object is, the smaller it seems, but if we already know how big it is—a person, for instance—we judge how far away it is from its apparent size even if it is on a featureless plain that gives no cue. In a 1951 experiment, one researcher made up playing cards ranging from half the normal size to twice the normal size and showed them to subjects under laboratory conditions in which there were no cues to distance. The subjects thought the double-size cards were close to them and the half-size cards far from them. All were at the same distance.59 Everyone, moreover, has experienced the moon illusion—the full moon looks remarkably larger when it is on the horizon than when overhead. Of the explanations currently offered, the most persuasive is that when the moon is close to objects on the horizon, they affect our judgment of its size; when it is overhead, away from all such clues, we judge it differently.
—Perspective: Parallel lines running away from the viewer, such as railroad tracks or the edges of walls, converge with distance. How powerfully we are influenced—or, one should say, informed—by this cue was shown earlier in Figure 13 on page 331: The perspective gradient enables us to perceive the farther figure as roughly the same size as the nearer one, although in fact, as shown, the image of the former is only a third the size of the latter.
—Interposition: When an object is partly concealed by another we realize that the concealed one is farther from us than the concealing one. In looking out over a cityscape, we easily sense the distance of a remote tall building from the fact that closer ones obscure its lower floors; at sea, on the other hand, the distance of a floating object is much harder to judge.
—The texture of a surface—a grassy field, a cement sidewalk—is constant, but the increasingly finer grain of the texture at greater distances makes it an important cue to the distance of anything on that surface.
—Faraway buildings or hills are pale and hazy compared with nearby ones, owing to the greater amount of atmosphere between them and us.
—Motion parallax—the changing relationship of things to each other as we move—is an important source of depth information, particularly when nearby objects are seen in relation to distant ones.
—Convergence and accommodation: When we look at something very close to us, our eyes angle inward and the muscles around each lens strive to keep it in focus. When we look at something far away, our eyes are parallel and the lenses relaxed. The concomitant visceral sensations are important cues to the distance of objects ten feet or less from us.
—Binocular disparity: When we look at something relatively close to us, its image falls on the fovea—center of the retina—of each eye, and the images of other objects equally far away fall on corresponding parts of both retinas. The images of objects either nearer or farther away, however, fall on different parts of the two retinas, as this diagram indicates:
FIGURE 33
How binocular disparity conveys depth
The disparity between the retinal images is interpreted by the brain to indicate which object is farther from us. Binocular disparity is most effective from close up to somewhere between eight hundred to nineteen hundred feet.60 Some perception theorists regard it as the most important of all cues to depth.
All the foregoing redundant cues to depth can be explained in terms of innate mechanisms or of learned behavior. But the innate aspect of depth perception is supported by other and more convincing evidence.
A historic series of experiments indicating that depth perception is instinctive was performed at Cornell in the late 1950s and early 1960s by Eleanor Gibson, whose work on high-speed reading of pronounceable and unpronounceable words we saw earlier, and a colleague, Richard Walk. Gibson, who had a lifelong aversion to cliffs, and Walk, who during World War II had trained paratroopers to jump off a high platform, jointly conceived of and created a “visual cliff” to determine whether rats learn depth perception or are born with it. The visual cliff was a thick sheet of glass with tile-patterned wallpaper on the underside of half of it and the same paper under the other half but several feet below. The question was whether creatures that had had no experience of depth—that had never tumbled off a high place of any sort—would automatically shun what looked like a drop-off.
The researchers reared chicks, rats, and other animals in the dark, depriving them of any experience of depth, then placed them on a board that crossed the glass between the shallow side and the seemingly deep one. The results were dramatic. The animals, though they had never experienced depth, almost always avoided the deep side and stepped off the board onto the shallow side.
Gibson and Walk then tried human infants. As Gibson recalled later:
We couldn’t very well rear the infants in the dark, and we had to wait until they could locomote on their own to use avoidance of the edge as our indicator of depth discrimination, but infants of crawling age did avoid the “deep” side. They may have learned something in the months before they could crawl; but whatever it was, it could not have been externally reinforced, since the parents never reported that the babies had fallen from a height.61
The mother of each infant would stand at one side or the other of the apparatus and beckon to her child. In nearly all instances, the infant crawled readily toward her when she was on the shallow side, but only three out of twenty-seven ventured onto the deep side when their mothers were there.62
Later laboratory work by others, however, weakens the Gibson-Walk conclusion somewhat, suggesting that the fear of heights in human infants is learned through locomotor experience in general.63 But impressive evidence that depth perception is built into the nervous system came in 1960 from an unlikely source, AT&T’s Bell Laboratories, and an unlikely researcher, a young electrical engineer who was a specialist in TV signal transmission. Bela Julesz, born and educated in Hungary, came to the United States after the abortive revolution of 1956, and was hired by Bell Labs in Murray Hill, New Jersey, to develop ways to narrow the band widths used by TV signals. But Julesz was drawn to more interesting questions and from 1959, with Bell Labs’ acquiescence, devoted himself to research on human vision. Though he never acquired a degree in psychology, he became a widely known, award-winning perception psychologist, the head of visual perception research at Bell Labs, a MacArthur Fellow, and, in 1989, director of the Laboratory of Vision Research at Rutgers University.64
Julesz had barely begun vision research when he came up with the idea that made him instantly famous in psychological circles. He had been surprised to find, in reading about stereoscopic depth perception, general acceptance of stereopsis as the result of the brain’s matching cues to form and depth in each eye’s image. This was thought to lead to fusion of the images and depth perception. Julesz, who had had some experience in Hungary as a radar engineer, felt sure that this was wrong.
After all, in order to break camouflage in aerial reconnaissance, one would view aerial images (taken from two somewhat different positions) through a stereoscope, and the camouflaged target would jump out in vivid depth. Of course, in real life, there is no ideal camouflage, and after a stereoscopic viewing one can detect with a single eye a few faint cues that might discriminate a target from its surroundings. So I used one of the first big computers, an IBM704 that had just arrived at Bell Labs, to create ideally camouflaged stereoscopic images.65
These consisted of randomly created patterns of black and white dots, as in this pair:
FIGURE 34
When these patterns are stereoscopically merged, the center floats upward.
There are no cues to depth in these two patterns when each is looked at alone. But although they are largely identical, a small square area in the center has been slightly shifted to one side by the computer so that w
hen each image is seen by one eye and the patterns merged, that area produces a binocular disparity—and seems to float above the rest of the background. (To see this remarkable effect, hold a 4″×6″ card or a sheet of paper vertically in front of and perpendicular to the page so that each eye sees only one image. Focus on one corner of the pattern, and in a little while the two images will migrate toward each other and fuse. At that point the center square will appear to hover an inch or so above the page.)
The random-dot stereogram is far more than an amusing trick. It proves that stereoscopic vision does not depend on cues in each retinal image to create the experience of three-dimensionality, and that, on the contrary, the brain fuses the meaningless images and thereby reveals the hidden cues to three-dimensionality. This is not a cognitive process, not a matter of learning to interpret cues to depth, but an innate neurological process taking place in a particular layer of the visual cortex. That is where a highly organized mass of interacting cells performs a correlation of the dots in the patterns, yielding fusion and the perception of the three-dimensional effect.66 (Stereopsis is not the only way we achieve depth perception. Julesz’s work does not rule out others, including those which involve learning.)