Visual Attention

Attention has a complex meaning in psychology. In its early history, it was described closely related to subjective awareness of the world around us. As James said [James 1890]:

" Everyone knows what attention is. It is the taking possession of the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration of consciousness are of its essence. ........."

In recent years the concept of attention has refined its meaning in contrast to the idea of a purely automatic processing that occurs without attention. In cognitive neuroscience, it is now probably viewed as a neural system for the selection of information similar in many ways to the visual, auditory, or motor systems [Posner 1994]. According to the evidence from neuroscience, an attention system can be divided into separate subsystems performing independent but interrelated functions interacting with other domain-specific systems. Attention is carried out by a network of anatomical areas and is therefore neither the property of a single center nor a function of the brain as a whole [Posner 1990]. Attention mechanism of human vision system has been applied to serve machine visual system for sampling data nonuniformly and utilizing its computational resources efficiently [Ballard 1991]. The fundamental work on and the critical role of attention in vision have been described by Yarbus (1967), Neisser (1967), Richards and Kaufman (1969) and many other researchers. Further work on attention can also be found in [Posner 1980], [Kosslyn 1980] and [Treisman 1980]. The visual attention mechanism may have at least the following basic components [Tsotsos, et. al. 1995]:

(1) the selection of a region of interest in the visual field;
(2) the selection of feature dimensions and values of interest;
(3) the control of information flow through the network of neurons that constitutes the visual system; and
(4) the shifting from one selected region to the next in time .

There are many ways that can be used to classify the attention system according to its various aspects. In a stimulus' point of view, the stimulus may attract attention by exogenous or endogenous methods. The exogenous components are mainly determined by external stimulus characteristics, whereas the endogenous components mostly depend on the subject's intentions and actions. In a subject's point of view, the subject can actually switch the gaze fixation point to the point being attended to (i.e., overt attention). On the other hand, it can also shift the attentional processing or gaze to a new location in the visual field for foveating without any a fixation shift or motor action (i.e., covert attention). As described by Treisman et. al. (1984), the features that are attractors of covert visual attention are those parts of an image that differ from all the other parts by a single aspect. And an object's shape, degree of symmetry, and the spatial distribution of objects in a scene are the important features for overt attentional stream. In the view of the route of information processing and attentional control, there are two kinds of execution methods: one is bottom-up or stimulus-driven, such as exogenous attention; another is top-down or goal-directed, such an endogenous attention. Combining some neurological models of attention, Perry and Hodges [Perry and Hodges 1999] have divided attention into three broad categories:

(1) Selective attention and shifting
Its defining characteristics are focusing on single relevant stimulus or processing at one time while ignoring irrelevant or distracting stimuli;

(2) Sustained attention
Its defining characteristics are the maintenance of abilities to focus attention over extended periods of times; and

(3) Divided attention
The defining characteristics are sharing of attention by focusing on more than one relevant stimulus or process at one time.

However, all the above classifications are interweaved with both active (or voluntary) and passive (or involuntary) attention. The division of active and passive attention switch was proposed by James (1890). The findings from neuroscience indicate that the separate attentional resources exist for different stages of processing and distinct parallel neural pathways [Mishkin 1983, Posner 1994, Michie 1999].

The following model, SCAN, was proposed by Postma [Postma1, 1997]:


SCAN consists of three main components: an input image, a gating network, and a classifier network. Given an expectation pattern E, the best-matching part of the input image is selected as the attended pattern and channeled by the gating network towards the output which serves as input for the classifier network. The shaded area represents the attentional beam.

Another model SLAM was proposed by Phaf,, [Phaf 1990]:


SLAM uses two main procedures to select visual stimuli, within-module competition and precueing of behaviourally relevant attributes.

There are still some other typical models, listed as following:

a). A feature-integration theory [Treisman 1980]

b). SERR [Humphreys, 1993]

c). Guided search [Wolfe 1994]

d). VIST [Ahmad 1992]

e). Dynamic routing circuits [Olshausen, 1993]

f). What-and-where filter [Carpenter, 1998]

g). Active vision [Aloimonos 1988, Bajcsy 1988, Ballard 1991]



S. Ahmad, VISIT: a neural model of covert visual attention, in Advances in Neural Information Processing Systems, edited by J.E. Moody,, 1992, 4:420-427, San Mateo, CA: Morgan Kaufmann.

Y. Aloimonos,, Action vision, Int'l. J. Comp. Vision, 1988, 7:333-356.

R. Bajcsy, Active perception, Proc. IEEE 76, 1988, 8: 996-1005.

D. Ballard, Animate vision, Artificial Intelligence, 1991, 48:57-86.

G.A. Carpenter,, The what-and-where filter: a spatial mapping neural network for object recognition and image understanding, Computer vision and Image Understanding, 1998, 69(1): 1-22.

G.W. Humphreys and H.J. Miiller, Search via recursive rejection (SERR): a connectionist model of visual search, Cognitive Psychology, 1993, 25(1): 43-110.

S.M. Kosslyn, Image and mind, Harward Univ. Press, Cambridge, MA, 1980.

W. James, The principles of psychology, New York: Holt, 1890, pp. 403-404.

P.T. Michie,, An exploration of varieties of visual attention: ERP findings, Cognitive Brain Research, 1999, 7:419-450.

M. Mishkin,, Object vision and spatial vision: two cortical pathways, Trends in Neurosciences, 1983, 6:414-417.

U. Neisser, Cognitive psychology, New York: Appleton-Century-Crofts, 1967.

B.A. Olshausen,, A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information, The J. of Neuroscience, 1993, 13(1):4700-4719.

R.J. Perry and J.R. Hodges, Attention and execution deficits in Aliheimer's disease: a critical review, Brain, 1999, 122:383-404.

R. H. Phaf,, SLAM: a connectionist model for attention in visual selection, Cognitive Psychology, 1990, 22(3): 273-341.

M.I. Posner, Orienting of attention, Quat. J. Exper. Psych., 1980, 32:2-25.

M.I. Posner and S.E. petersen, The attention system of the human brain (review), Annu. Rev. Neurosci., 1990, 13:25-42

M.I. Posner, Visual attention, in The Neuro-psychology of High-level Vision: Collected Tutorial Essays, edited by J.F. Martha and G. Ratcliff, 1994, pp. 217-239.

E.O. Postma,, SCAN: a scalable model of attentional selection, Neural Networks, 1997, 10(6):993-1015.

W. Richards and L. Kaufman, Centre-of-gravity tendencies for fixations and flow patterns, Perception and Psychophysics, 1969, 5(2): 81-84.

A. Treisman and G. Gelade, A feature integration theory of attention, Cognitive Psychology, 1980, 12: 97-136.

A. Treisman and R. Paterson, Emergent features, attention and object perception, J. Exp. Psychol: Human Perception and Performance, 1984, 10:12-31.

J.K. Tsotsos,, Modeling visual attention via selective tuning, Arti. Intell., 1995, 78:507-545.

J.M. Wolfe, Guided search 2.0: a revised model of visual search, Psyonomic Bulletin and Review, 1994, 1(2):202-238.

D.L. Yarbus, Eye motion and vision, Plenum Press, New York, 1967.