Neurons for Objects



Any object projects a very different image on the retina when lit from two dif­ferent angles or when presented either on the left or on the right of visual fixation. This variability poses a severe challenge to the neural circuits that construct the categories with which we recognize the world. When we need to identify an object, it is irrelevant whether it is near or far, to our left or to our right, upright or horizontal, in light or in shade. We can easily recognize it regardless of illumination, inclination, distance, and location. In the course of evolution, the brain was undoubtedly subjected to intense selective pres­sures that obliged it to arrive at an invariant perception of the world. We now know that invariance is an essential characteristic of the inferior temporal lobe. Monkeys with ventral temporal lesions can no longer recognize objects invariantly. Unlike undamaged monkeys, they fail to extend learning to new conditions of lighting, size, or location of a learned shape. In consequence, brain lesion experiments show that the inferior temporal cortex is a key player in the collection of invariant visual information—an innate competence that is probably related to our ability to recognize that “RADIO:’ “radio,” and even “Radio” represent the same word.

Recently, recordings of single neurons have begun to reveal a fine-grained neuronal code for visual objects in macaque monkeys. As early as the late 1960s, David Hubei and Torsten Wiesel recorded neuronal activity in the primary visual area of the cat, and found that neurons discharged in response to sim­ple bars of light—paving the way to work that was to be awarded the Nobel Prize in 1982. In the 1970s and 1980s, several pioneering neuroscientists (Robert Desimone, Charles Gross, David Perrett, Keiji Tanaka) followed in their footsteps and moved their electrodes forward in the macaque brain.”‘ They found that to elicit a discharge in the inferior temporal neurons, they had to present monkeys with more complex stimuli than simple lines. They put together a great variety of images, shapes, faces, and objects and pre­sented them visually, one at a time, while an electrode recorded the neuronal discharges.

The selectivity of the neurons’ discharges immediately struck them. Fre­quently a neuron responded to a given face alone, or to only one particular object out of dozens of others. This selectivity was all the more remarkable because it was accompanied by a great capacity for constancy in the face of massive changes in the details of the input image. In the primary visual cortex, neural responses are conditioned by a very narrow input win­dow on the retina (the “receptive field”). Neurons in the inferior temporal cortex, however, are completely different. Their receptive fields are vast and they frequently respond to objects appearing almost anywhere in the visual scene. At any given point in this vast reception zone, each neuron maintains a preference for a favorite specific object. This bias is true even when the image slips by several degrees, when its size is multiplied or divided by two,” or when the light and resulting shadows change.'”

What happens when the object turns around? Our visual neurons display significant difficulties with invariance for rotation. In the earliest stages of visual recognition, the consecutive positions of a rotating face are not coded by the same neurons. The right, front, and left profiles activate neighboring cortical patches that partially overlap with one another. When an object rotates on the retina, most of the inferior temporal neurons respond only to a specific view. As the object rotates away from this pre­ferred view, the neurons tolerate about 40 degrees of rotation and then cease to respond. A few neurons, however, are more abstract and react to an object regardless of its position in space – Simply put, these invariant neurons seem to pool input from many view-specific neurons, each roughly adjusted to a given angle. They essentially detect the presence of a certain object by pooling across all of the possible viewpoints from which it might be seen.

In brief, the visual invariance problem appears to be solved through a series of successive processing stages, each implemented within the inferior temporal cortex. At the top level of this visual hierarchy, activity in groups of neurons remains constant even when an object moves around, recedes, turns, or casts new shadows. This mechanism predates the acquisition of reading by millions of years—but its existence plays a key role in our ability to recognize words anywhere on a page and in any font and size.

Grandmother Cells

The physiological observation that a single neuron can respond to one image out of a thousand is stupefying. Is our cortex really covered by millions of ultra-specialized neurons’? The physiologist Horace Barlow once proposed, tongue in cheek that the brain contains “grandmother cells,” or cells that only respond to a single familiar person. Although Barlow’s statement had all the appearances of a joke, he was in fact right or at least close to the truth. The brains of monkeys, like those of humans, contain neurons that are so special­ized they appear to be dedicated to a single person, image, or concept. For instance, a neuron that responded exclusively to Hollywood superstar Jenni­fer Aniston was once recorded in the anterior temporal region of a human epilepsy patient.”‘ It did not appear to matter whether the stimulus was a color photograph, a close-up of her face, a caricature, or even her name in writing—only Jennifer seemed to excite this neuron!

The concept of a grandmother neuron, however, must be mitigated by sev­eral observations. Even when this amazing selectivity is uncovered in a single neuron, it must result from computation by a much larger network. The exper­iments to which I refer involved the insertion of an electrode at random in the visual brain. If one can find a specialized neuron in this haphazard way, there are no doubt millions of others waiting to be discovered? Their specificity, moreover, necessarily results from the collective operation of many cells. In the final analysis, the selective response by a single cell is like the tip of an ice­berg: we can only see it because of the underlying mass of cells that conspire to create a hierarchy of detectors. For all we know, a single neuron, on its own, can only perform a relatively elementary computation oil input. On the output side, furthermore, a single neuron alone does not have much clout: only a coalition of a few hundreds of cells can influence other groups of cells. Each visual event or face we recognize must therefore be encoded b%, several clusters of selective cells, or what is known as a “sparse” coding scheme.

Picturing the entire process that leads from the retina, where millions of photoreceptors only react to blotches of light, to the neurons that detect the presence of Jennifer Aniston, is a mind-boggling feat. ‘The detailed neuronal organization of visual recognition is only just beginning to be uncovered. Anatomically, we know that the macaque monkey’s inferior temporal cortex is organized like a pyramid. The visual image enters at the base of the pyra­mid, and myriad consecutive connections convey it from the primary visual cortex, at the back of the head, to the front end of the temporal pole.’ This anatomical progression is accompanied by an increase in the complexity of the images that make a neuron fire. At each stage, the recombi­nation of responses by neurons from the lower level allows new neurons to respond to increasingly complex portions of the image. Our visual system is very precisely wired to reassemble the giant jigsaw puzzle created by the ret­ina when it explodes incoming images into a million pixels.

If we could climb the neuronal pyramid step by step, synapse by synapse, and make recordings from single neurons encountered along the way, from the primary visual cortex to the inferior temporal lobe, we would see three types of changes:

•     First, the preferred images that make the neuron tire would become increasingly complex. A small, inclined bar is enough to bring on a sig­nificant discharge in the primary visual cortex. More complex curves, shapes, fragments of objects, or even entire objects or faces are, how­ever, needed to trigger neurons at the higher levels.

  • Second, neurons would begin to respond to increasingly broader por­tions of the retina. Each neuron is defined in terms of its receptive field, or the place on the retina to which it responds. The receptive fields broaden by a factor of two or three at each step. 1 his means that the part of the retina to which the preferred object must be presented for the neuron to fire doubles or triples in diameter at each step.
  • Finally, an increasing degree of invariance is present. Early on, neurons are sensitive to changes in location, size, or fighting of the incoming pic­ture. In higher-level areas, in the move up the hierarchy, neurons toler­ate increasingly significant shifts and distortion, of the input image.

Functional brain imaging on human volunteers shows that hierarchical organization and increasing invariance also hold for our visual cortex. In humans as in other primates, the concept of neuronal hierarchy provides a simple though still hypothetical solution to the issue of visual invariance. When our cortex is called on to identify an object, it must learn what it looks like from different angles. Learning mechanisms allocate a set of neurons to each view of the object. They then wire the illustrations together so that they collectively excite the same neurons at the next level further up the pyramid. The net result is an invariant neural circuit that tolerates considerable changes in viewing position. This simple idea can be replicated at each step. ‘The neu­rons responsible for recognizing Jennifer Aniston’s profile collect input from the neurons at lower levels that identify a fragment of her face. ‘lliese neurons are able to recognize an eye or a nose because the preceding level has already detected the patterns of light and dark that are compatible with the presence of these features at a given location on the retina.

In brief, the keys to the primate’s visual system are the notions of hierarchy and parallel functioning. The mental image that is first split on the retina into myriad “pixels” is progressively recomposed by a pyramid of neurons, all operating simultaneously. This approach might at first seem inefficient, because millions of neurons must be dedicated to each of the possible frag­ments that make up a single visual scene. The burden on the nervous system, however, is relatively modest because the scene can be distributed across a gigantic array of simple parallel processors. Much as a colony of ants is more “intelligent” than a single ant, collective action by millions of neurons accom­plishes operations far more complex than what could be achieved by a single neuron. The vast number of computing units, in fact, leads to considerable savings in processing time. Single neurons are slow computers. They receive and transmit information in about ten milliseconds, which is a million times slower than the speed of an electronic microprocessor. Yet by combining the activity of millions of neurons, our visual system becomes the most efficient of computers: it only takes one-sixth of a second to spot a face, regardless of its identity or location.”‘

The architecture of the brain has inspired many programmers. Several computer models of the visual hierarchy I have described are now available. The best are now close to human performance, both in terms of speed and the extent of image distortion they can tolerate. Thanks to these artificial neu­ronal networks, automatic face recognition no longer needs to be viewed as something out of science fiction. It is part of real life—even the simplest digi­tal camera now includes face and smile recognition.Excerpted from ‘Reading in the Brain’ by Stanislas Dehaene Page 125-133

Leave a Comment