Intuitively, object recognition is the isolation and identification of structure from the midst of other detail in an image of a scene. It is also the assignment of a symbol to a group of features with the implication that those features could only belong to an object designated by that symbol. Hence, when we say we perceive (recognize) "John", we assert that there is a person named "John", who accounts for all the perceived features, and that this person is at the specified location in the given scene.
When described like this, object recognition seems little different from a general concept-matching paradigm. So, what distinguishes it as a vision problem? The answer lies in the types of data, its acquisition, the viewer-to-object geometry, the image projection relationship and the representations of structures to be recognized. This research addresses several aspects of how to perceive structure [166] visually:
Visual recognition involves reasoning processes that transform between internal representations of the scene, linking the lower levels of image description to the higher levels of object description. The transformations reflect both the relationships between the representations and the constraints on the process. The most important constraints are those based on the physical properties of the visual domain and the consequent relationships between data elements.
Vision does have aspects in common with other cognitive processes - notably model invocation and generalization. Invocation selects candidate models to explain sets of data, a task that, in function, is no different from selecting "apple" as a unifying concept behind the phrase "devilish fruit". Invocation makes the inductive leap from data to explanation, but only in a suggestive sense, by computing from associations among symbolic descriptions. Generalization also plays a big role in both recognition and other processes, because one needs to extract the key features, and gloss over the irrelevant ones, to categorize a situation or object.
The first half of this chapter considers the problem of recognition in
general, and the second half discusses previous approaches to recognition.