Saccades can be recorded using a technique called eye tracking. A modern eye tracker consists of a camera that films the eye and, based on the position of the pupil, determines exactly where the subject is fixing his gaze.
In the image registered by an eye tracker in Figure 2.2, the center of the cross corresponds approximately to the size of the image projected to the fovea in the retina, and thus to what we can see sharply. The greater resolution in the fovea is the result of a high density of photoreceptors located there. The visual information gathered from outside that roughly 1.5-degree visual angle that projects to the fovea is much more diffuse. To generate the impression that we see more than what falls within the fovea “in focus,” our eye scans the field of view at a rate of about three saccades per second.
This presents a fascinating puzzle. To understand why, let us carry out a simple experiment: let us close our eyes, open them for one second, and then close them again . . .
Figure 2.2
A movable eye tracker consists of a camera mounted on a pair of goggles. Based on the location of the pupil, it computes where the subject is looking (the cross in the bottom image) within the field of view registered by a second camera.7
In that blink of an eye, we performed about three saccades, meaning we saw in detail only about the area of three small coins. The rest was just a blur to the eye, even though we feel as though we were able to see everything in front of us clearly. This is one of the wonders of the brain, one of the many mysteries that keep us neuroscientists awake at night. When we look at a face, for example, we think that we see all of its features in focus. In truth, however, our eyes simply stop at a few specific points, while the brain “fills in” the rest of the information. This effect was described in the 1960s by a distinguished Russian psychologist named Alfred Yarbus.8 As Figure 2.3 illustrates, Yarbus showed that when we scan a face, we tend to focus on the eyes and the lines of the nose and mouth, which are precisely the most salient features of a person’s appearance.
Figure 2.3
Trace of visual fixations upon viewing a female face (example drawn from Yarbus’s book) and a self-portrait by Van Gogh9
Yarbus also showed that what we see is greatly influenced by the task we are performing, which determines our focus of attention, based on conscious and unconscious factors. The unconscious factors relate to the saliency of the information; in other words, how much it stands out from its surroundings. For example, a person wearing an orange T-shirt will clearly stand out in a group of people dressed in gray; a moving car will be more salient than cars parked on the street. The conscious factors, on the other hand, relate to what interests us as we scan a scene. If, in the crowd at the end of a soccer game, I am looking for my brother—who is wearing the jersey of his favorite team—my attention will be focused not on the passing cars or surrounding buildings, but on the people, especially those wearing that jersey. If instead my brother and I have agreed to meet at his car or at a nearby café, my gaze will concentrate on the parked cars or the local storefronts.
The pattern of visual fixations illustrated in the figure tempts me to digress a bit. Van Gogh’s self-portrait at the Musée d’Orsay is one of the most stunning paintings by one of my favorite artists. Art is tremendously subjective and can elicit varying emotions in its beholders.10 In my case, as in that of the subject whose eye movements we show in Figure 2.3, I cannot help but stare enthralled at Van Gogh’s eyes in the portrait.
Why does art affect us so? Why can we be moved to tears by a painting whose subject we would barely notice if we saw it in a photograph? There are of course many aspects that distinguish a work of art from reality itself, but I would like to dwell on a particular aspect that is germane to what we are discussing in this chapter. When we look at a photograph, the resolution is uniform throughout the image. An image rendered at 300 ppi has that resolution at its center and at its edges, even though the former may be depicting a person’s features and the latter some irrelevant details of a background wall. When we see a photo, we choose where to look, either consciously or unconsciously, and in principle can observe each area of the image with the same resolution of detail. In a painting, on the other hand, the artist may paint one area in great detail and barely sketch another, alter contrast and color composition, or play with the texture of the canvas to shift the center of attention. In other words, the artist influences our natural patterns of visual exploration and decides for us what we should observe in detail and what we should ignore. In so doing, artists load scenes with subjectivity and share with us their specific vision and sensibility, something that goes well beyond the faithful reproduction provided by a standard photograph. Again, this is just one aspect among many that combine to infuse a work of art with emotional meaning. To illustrate this digression, consider the example of a painting by Mariano Molina, a great artist and a friend.11 In his Center of Gaze, Mariano manages to focus the viewer’s eyes on a specific place in the canvas. The center of gaze is the zone where the painting is “in focus,” where it has the greatest detail; it is precisely the area to which most visual fixations drift, as we corroborated using an eye tracker. This center of attention somehow entraps the eye movements and breathes into the canvas a sense of movement, a dynamic conceived in Mariano’s brain that was not present in the original photo that inspired the composition.
Figure 2.4
Visual fixation pattern of a subject observing Center of Gaze (Mariano Molina; acrylic on canvas)
Let us now return to the core topic of this chapter: How much do we see? To summarize our previous discussion: we found a three-order-of-magnitude difference (from gigabytes to megabytes) between the information present in our field of view and the information that the eye transmits to the brain. However, this difference disappeared once we accounted for the fact that we observe in detail only what is located in the fovea, at the center of our field of view. It is worthwhile to continue for a bit with these estimates because they illustrate and clarify some fundamental principles of how the brain works. Let us again consider Steve Jobs’s statement as he launched the iPhone 4: the resolution of the eye at a distance of 12 inches is around 300 ppi. To compute the amount of information that the eye receives from its surroundings, we now know that we can in principle disregard the rest of the field of view and concentrate on the area covered by the fovea, which at a distance of 12 inches corresponds to a circle with a diameter of 0.3 inches. Thus the information that arrives through the fovea is π × 0.152 (the area of the fovea) × 3002 = 6,361 pixels. We can again convert this number to bytes: recalling that one pixel has three bytes of color information, and assuming that we gather information at a rate of 30 frames per second (as does a standard video camera), we find that the information collected through the fovea is about 0.5 MB per second. This value is now in the same order of magnitude as the 1 MB per second estimated by the University of Pennsylvania researchers to be the amount of information that the eye transmits to the brain. If we take into account the fact that the eye also receives information (albeit at a lower resolution) from the area around the fovea, the two estimates become even closer.
We have made great progress in our understanding of the way the brain processes visual information. However, we have left out a crucial detail. So far we have described the encoding and transmission of pixels in the field of view, but, as we will see in the next chapter, this is far from how human sight actually works.
Chapter 3
DOES THE EYE REALLY SEE?
In which we describe the processing of information in the retina, the difference between sensation and perception, the use of unconscious inferences, cases of blind people who regained sight as adults, and the relation between perception and memory
As in a camera, the image that passes through the pupil is focused by a lens on the back of the eyeball. This is where the retina is located, and where the comparison to a camera ceases to be apt.
In the human retina, visual information is initially ca
ptured by two kinds of photoreceptors: rods and cones. Rods, of which a single human eye contains some 120 million, are what allow us to see in the dark. Extremely sensitive to light, they are concentrated on the periphery of the retina, outside the fovea. They cannot resolve color (which is why we cannot see color in the dark), and are inactive by daylight. Cones are much less numerous, on the order of six million, and are located mainly in the fovea. They are sensitive to red, green, and blue, allowing us to see clearly and in color at the center of our visual field. The information collected by the rods and cones is sent through the bipolar, horizontal, and amacrine neurons to the retinal ganglion neurons (the million cells which, as we saw in the previous chapter, transmit visual information to the brain). Now, why do we have so many neurons, and of so many different kinds? Why do we have 126 million photoreceptors, if the information they collect is funneled into a mere one million retinal ganglion neurons? Moreover, as we just learned, the image on the fovea has a resolution of about 6,000 pixels—it seems absurd that we have six million cones for the task of resolving such meager information.
The answer is that the retina does not process or transmit visual information in the form of a simple re-creation of the pixels that make up the image. Instead, it transmits information that will give rise to a representation of the image, generated not by the eye, but by the brain. As strange as it may sound, the eye does not see; the brain does. Why, then, are there so many neurons in the retina? Because the retina begins the processes that enable us to extract meaning from what we see.
One of the fundamental principles of visual processing in the retina was discovered by Stephen Kuffler in the 1950s. By registering the activity of the ganglion neurons of cats in response to light beams, Kuffler observed that a group of these neurons (called on-center) tended to fire rapidly in response to stimuli located within the cat’s field of view but dampened their activity if the stimuli occurred on the periphery of this “receptive field.” Other ganglion neurons (called off-center) had the opposite behavior, responding to stimuli in the periphery and inhibiting their firing if the center was stimulated. This is what is known as center-surround organization.1
The great advantage of this is that, rather than simply reflecting the presence or absence of light via a sort of pixel bitmap, center-surround organization—created by the distribution of the different types of neurons in the retina and the connections between them—allows the detection of contrasts and edges. The brain thus receives information about lighting changes, differences between the center and the periphery of the receptive fields of these neurons. This is a very smart way to transmit information, and to focus on relevant aspects while discarding the rest. For example, when I look at the wall of my living room, I do not need to encode information about each individual pixel of that featureless expanse. It would be absurd to devote resources to such irrelevance. In fact, I only faintly perceive the gradual changes in the wall’s color as it is more illuminated closer to the window. On the other hand, I can very clearly perceive the contrast caused by the presence of a painting on this wall, as well as the more sophisticated contrasts that define the different forms within that painting. This is precisely what center-surround organization enables us to do. To illustrate this idea, let us look at Figure 3.2. The gradient in background color makes the bar in the middle appear lighter toward the left and darker toward the right, even though it is actually the same color throughout. This effect is due to the fact that the retina does not perceive absolute color but, rather, color contrast.2
Figure 3.1: Center-surround organization of the retinal ganglion neurons
On-center neurons respond to stimulation of the center but inhibit their firing when the periphery is stimulated. Off-center neurons, on the other hand, activate when the periphery is stimulated and inhibit their activity when the center is stimulated. When both the center and the periphery are stimulated, the effects cancel each other, and activity is unchanged for both types of neurons. In each example, vertical arrows mark the instant when the stimulus is applied.
Figure 3.2: Illusion of contrast
Due to contrast with the background, the right side of the bar seems darker than the left, even though the bar has the same color throughout.
In the previous chapter, we saw that one way to select visual information is via saccades, directing the focus of our sight (and the multimillion-neuron machinery of the fovea) toward whatever catches our attention. Now we see that within the fovea itself there is a second information-selection mechanism based on the retina’s center-surround organization. These two mechanisms lay bare one of the primary principles underlying vision. Sight does not function like a camera. On the contrary, the brain selects a tiny amount of information and processes it redundantly and in parallel in order to extract meaning. This process continues in the cerebral cortex, where just in the primary visual area (or V1), there are a few hundred neurons for each neuron that transmits information from the retina.3 Unlike a camera, which stores with equal resolution each bit of visual information, sight is highly directed. It is focused on capturing relevant information to convey meaning, not fidelity. After all, I am not interested in discerning the exact details of thousands of hairs in yellow contrasting with others in black; I just want to know it is a tiger and flee quickly. The processing of visual information in the brain is then much more sophisticated and complex than what a computer does to an image; it is nothing less than the result of millions of years of evolution.
The processes that underlie the way we select information as we fix our gaze on something that strikes us, the ways in which our neurons encode contrast and ignore homogeneity, have only been elucidated in recent decades. Yet the general theory of how we construct reality based upon the information that we receive through our eyes, and the distinction between sensation (the physical stimulus impinging on the sensory organ) and perception (the interpretation of that stimulus) is much older. More than two millennia ago Aristotle postulated that, starting with the information received through the senses, the mind generates images that are the basis of thought. In On the Soul, Aristotle lays out a brilliant vision of the processing of sensory information that is worth quoting:
Thinking is different from perceiving and is held to be in part imagination, in part judgment … But what we imagine is sometimes false though our contemporaneous judgment about it is true; e.g., we imagine the sun to be a foot in diameter though we are convinced that it is larger than the inhabited part of the earth . . . To the thinking soul images serve as if they were contents of perception (and when it asserts or denies them to be good or bad it avoids or pursues them). That is why the soul never thinks without an image.
—ARISTOTLE, ON THE SOUL, 427B, 428B,
431A (TRANSLATED BY J. A. SMITH)
These images—or ghosts, as they were called by Thomas Aquinas, who revisited Aristotle’s ideas in the Middle Ages—are our interpretation of reality, an interpretation that generates concepts from abstractions by eliminating details and extracting meaning. Similar distinctions between sensation and perception were made by the Egyptian astronomer Ptolemy and by Alhazen (or Ibn al-Haytham), a medieval Islamic scientist considered by many to be the father of modern optics. Moreover, the difference between external reality and the perception we have of it is the quintessence of idealism and the foundation of modern philosophy, which begins with Descartes’s search for absolute truth by way of doubting his perception of reality, continues with the overvaluing of subjective perception by the British empiricists (Locke, Berkeley, and Hume), and lies at the heart of Kant’s transcendental idealism, which argues that we can only know the representations that we make of things but never “Das Ding an sich”—the thing in itself.4
I cannot move on without mentioning Hermann von Helmholtz,5 who in the late nineteenth century—long before there was a well-developed neuroscience to back him Up—described in detail the way the brain extracts meaning from the meager information provided by the senses. In partic
ular, Helmholtz observed that the information garnered by the eyes is very scant and that, based on past experiences, the brain makes unconscious inferences in order to assign a meaning to what we see. Like Aristotle, Aquinas, and especially the empiricists, Helmholtz argued that we do not see copies of reality, of external objects, but signs, constructions fabricated in our brains. These signs need not resemble reality; it suffices that they be reproducible. In other words, it is not necessary for the representation I make of an object to be similar to the object itself; it is enough if I get the same representation every time I see the object. Helmholtz writes:
The objects in the space around us appear to possess the qualities of our sensations. They appear to be red or green, cold or warm, to have an odor or a taste, and so on. Yet these qualities of sensations belong only to our nervous system and do not extend at all into the space around us. Even when we know this, however, the illusion does not cease . . .
—HERMANN VON HELMHOLTZ,
THE FACTS OF PERCEPTION, 1878
The value that Helmholtz attributes to the knowledge obtained from unconscious inferences is related to the vision of the British empiricists, for whom the mind is a tabula rasa, a blank slate on which we etch our knowledge based on our experience and the perception of our senses. Helmholtz illustrates this idea with the extremely ambiguous sensation we have of an object when we touch it with our fingers. Imagine, for example, holding a pen with eyes closed. The perception of holding a single pen is beyond question, but the tactile sensation of each finger is vague and ambiguous—in fact, it is the same sensation we would have if we were holding several pens at the same time. We form the perception of touching a single pen not just by combining the tactile sensations of the fingers, but also by making unconscious inferences based on our previous experience, taking into account, for example, the relative positions of the fingers.
The Forgetting Machine Page 3