Hierarchical Control

Hierarchy is a critical characteristic of the structure of the visual system, reflecting a divided-and-conquer strategy [Felleman, et.al. 1991]. In Marr's model [Marr1982], visual perception was described as an information processing, from the lowest level (features of proximal stimuli) to the highest symbolic level for object recognition (object-centered representations). With the development of neurophysiology and psychology, researchers have known that: The visual cortex is highly modular and consists of different cortical areas which are highly interconnected and organized in different processing pathways by a hierarchical and parallel ways; The connections among the areas are not all feedforward; The structure of cortex is layered which is differentiated by the types of neurons they contain and by both the input they receive and output they send; and Many visual areas are structured in terms of columns, perpendicular to the surface of area, in which the neurons are selective to similar visual features [Felleman, et.al. 1991, Amit 1995]. In general, the control strategies of information processing in multilayer machine vision systems may be one of the following three:

(1) bottom-up strategy which is mainly driven by visual input data and executes the information processing from input level to the higher level;

(2) top-down strategy which is mainly goal-driven or intention-driven from the higher level to the lower level.

(3) the integrated strategy which involves both bottom-up and top-down strategies.

In Gärdenfors' theory (1994), three cognitive levels of information representations were proposed:

The subsymbolic level, in which the information is strictly related to sensory data; The linguistic level, in which information is expressed by a symbolic language; and The intermediate and prelinguistic conceptual level, where the information is characterized in terms of a metric space defined by a number of cognitive dimensions, independent of any specific language.

The following is an example model proposed by Chella [Chella 1997].

 

In the above architecture, Block A receives the input from a camera and gives as output the 2 1/2D map images. The maps are sent to Block B, which builds a scene description in terms of a combination of 3D geometric primitives. Block C implements the mapping between the conceptual level and the symbolic level. Block D implements the linguistic model of the focus of attention mechanism, while Block E implements the associative model of the focus of attention.

 

References:

D.J. Amit, The Hebbian paradigm reintegrated: local reverberations as internal representations, Behavioral Brain Sci., 1995, 18:617-657.

A. Chella, et.al., A cognitive architecture for artificial vision, Artif. Intell., 1997, 89:73-111.

D.J. Felleman, et.al., Distributed hierarchical processing in the primate cerebral cortex, Cerebral Cortex, 1991, 1:1-47.

D. Marr, Vision, San Francisco: W.H. Freeman, 1982.