Next: Surface Data as Input Up: Object Recognition from Surface Previous: Some Previous Object Recognition

Recognition Approaches

This section summarizes the four traditional approaches to object recognition, roughly in order of discriminating power. Most recognition systems use several techniques.

Property Based Identification

When enough model properties are satisfied by the data, recognition occurs. The properties may be scene location, orientation, size, color, shape or others. The goal is unique discrimination in the model base, so judicious choice of properties is necessary. Duda and Hart's [55] analysis of an office scene typified this. Other examples include Brice and Fennema [41], who classified regions by their boundary shape and defined object identity by a group of regions with the correct shapes, and Adler [3], who ranked matches by success at finding components meeting structural relationships and summary properties. Property comparison is simple and efficient, but is not generally powerful enough for a large model base or subtle discriminations. The problem is always - "which properties?".

Property matchers often use pattern recognition based discrimination methods, implemented as sequential property comparison, decision trees or distance based classifiers. These are straightforward, but do not easily allow complicated recognition criteria (e.g. geometric or relational) without prior calculation of all potential properties, and treat objects at a single level of representation.

Given that properties are often continuous valued, and subject to error, a distance metric is often used to evaluate the match (e.g. [161]). If the properties fit a statistical distribution, then a probabilistic classifier can be used (e.g. [56]).

Grammatical and Graph Matching

Here, recognition is achieved when a particular grouping of data is structurally identical to a similar model pattern. This usually expresses relationships between image features, such as edges or regions, but may also refer to relationships in three dimensional data. This method requires evaluation of the match between individual features.

For objects with primitive distinguishable features that have fixed relationships (geometric or topological), two general methods have been developed. The first is the syntactic method (e.g. [115,46]). Valid relationships are embodied in grammar rules and recognition is done by parsing the data symbols according to these rules. Rosenfeld [140] presented a typical example of this matching method by using web grammars for analyzing two dimensional patterns. The main applications of grammatical techniques have been in fingerprint [117], circuit diagram, chromosome and texture analysis. A variation on this method uses rules to recognize specific features (e.g. vegetation in an aerial image [119] or urban building scenes [123]).

The second general matching technique is graph matching, where the goal is to find a pairing between subsets of the data and model graphs. Two key techniques are subgraph isomorphism and maximal clique finding in association graphs [18]. Barrow and Popplestone [16] used a subgraph matching between their data and model graphs. Ambler et al. [8] recognized by using a maximal clique method in an association graph between data and models. Combinatorial explosion can be controlled by using a hierarchy of structures [17] and Turner [161] exploited this method procedurally in his hierarchical synthesis matcher.

One advantage of graph matching is that it is well understood and produces understandable results through symbol matching, formal definition and computationally analyzable machinery. Unfortunately, graph methods tend to be NP-complete and are not practical unless the graph size is small. Matching would be more efficient if geometric predictions were used, allowing direct comparison instead of the complete matching that general graph matching algorithms require. Another disadvantage is that three dimensional scenes have changing viewpoints and occlusion, which distorts and fragments object descriptions (unless multiple graphs are used for alternative viewpoints).

Heuristic match criteria are still needed for comparing nodes and arcs, and for ranking subgraph matches. Barrow and Popplestone [16] used a heuristic weighting to evaluate the satisfaction of a subgraph match, including a factor that favored larger matches. Ambler et al. [8] used similarity of properties and relations between regions in a two dimensional parts scene. Nevatia and Binford [121] evaluated matches based on components found and parameter comparisons for the primitives.

Geometric Matching

Here, the geometric relationships in the model, initial object location knowledge and image feature geometry combine to allow direct matching. Roberts [139], Freuder [75], Marr [112] and others argued that partial matching of image data to object models could be used to constrain where other features were and how to classify them. Locating this data then further constrained the object's geometric location as well as increasingly confirmed its identity.

Adler [3] used a top-down control regime to predict image location in two dimensional scenes, and explained data lost because of occlusion. Freuder [75] described a two dimensional recognition program that used active reasoning to recognize a hammer in image region data. The program used image models to obtain suggestions about what features to look for next and advice on where to find the features.

Matching may be almost totally a matter of satisfying geometric criteria. The advantage of geometric matching is that the matching criteria are usually clear and geometric models allow directed comparisons. One might require that the accumulated error between predicted and observed features be below a threshold. For example, Faugeras and Hebert [63] used the model-to-data surface pairings that passed a geometric consistency measure and had minimum transformation estimation error.

Roberts [139] estimated the transformation from selected model points to image points. Transformation errors exceeding a threshold implied a bad match. Ikeuchi [96] recognized and oriented objects by computationally rotating extended gaussian images until good correspondences were achieved. Hogg [91] improved positional estimates using search over a bounded parameter space. The estimates were used to predict the position of edge points, and the number of verified edge points was used evaluate the estimates. Ballard and Tanaka [15] used a connectionist method for deducing a polyhedral object's reference frame given network linkages specified by geometric constraints. This follows Ballard's work [12] on extracting component parameters from intrinsic images by using Hough transform techniques.

Correct geometry is a strong constraint on an object's identity. Its limitations include the need for pairing the appropriate structures, control of combinatorial matching and integration with other matchables, such as structural properties and relationships.

Constraint Satisfaction

Implicit in the above methods are certain constraints that the data must satisfy. These constraints may apply to individual features, or to groups of features, or to relationships between features. Some researchers have tried to generalize the constraints by making them explicit. Matching algorithms can use direct search, graph matching (where the constraints specify the node and arc match criteria) or relaxation. Relaxation algorithms can apply to discrete symbol labelings [162], probabilistic labelings ([172,141,22,62]) or a combination of the two [19]. Barrow and Tenenbaum [19] used adjacency and homogeneity constraints to deduce identity in office scenes using height, intensity and orientation data. Hinton [86] formulated the substructure identity problem as a relaxation problem, with the goal of maximizing credibility subject to model constraints.

Nevatia and Binford [121] matched models using connectivity relationships between generalized cylinder primitives in the model and data to constrain correspondences. In ACRONYM [42], an object was identified by maximal refinement in a specialization lattice consistent with both the model constraints and image data. The refinement was by constraint satisfaction, where the constraints mainly applied to feature sizes and relationships. Constraint satisfaction was checked by inequality testing.

The constraint satisfaction approach encompasses the other methods described above. The ability of a constraint to be general is a real advantage, particularly when representing ranges of numerical values. Its weakness is it requires the choice of constraints that efficiently and adequately discriminate without rejection of minor undistinguished variants.

Most systems use a combination of the methods to recognize objects in more sophisticated scenes. For example, ACRONYM's [42] matching algorithm looked for subgraph isomorphisms between the picture graph, representing located image features, and the prediction graph, which was a precompilation of the object models. This graph tried to represent the likely sizes and intercomponent relationships between primitives, as seen in typical views of the object. A node or arc match required not only local, but also global satisfaction of constraints such as requiring all features to have the same reference frame.

The goal of most algorithms (but not ACRONYM's) was to use local constraints to produce global consistency. The difficulty with these pure methods is that they excessively simplify and ignore most of the global structural relationships between nameable object features.

Matching algorithms generally follow a combinatorial tree-search approach, where each new level in the tree pairs a new data feature to a model feature. Full tree expansion is usually prevented by using constraints to remove bad expansions (e.g. those coming from predictions of positions). For example, Faugeras and Hebert [63] and Grimson and Lozano-Perez [78] successfully used local relative position constraints to prune the search tree dramatically.

An improvement on the above basic methods is hierarchical recognition, in which objects are structured into levels of representation, and recognition matches components at the same level. Turner [161], Ambler et al. [8] and Fisher [66] used a bottom-up "hierarchical synthesis" process [145] and Adler [3] used top-down model directed analysis.

Final Comments

This completes a quick review of the philosophical and historical background to the research, and we now proceed to look at the research itself.

Next: Surface Data as Input Up: Object Recognition from Surface Previous: Some Previous Object Recognition

Bob Fisher 2004-02-26