Next: Deducing Object Position Up: Hypothesis Construction Previous: Hypothesis Construction

Thoughts on Hypothesis Construction

The hypothesis construction process described below attempts to find evidence for all model features. This is somewhat controversial and it is worthwhile to briefly discuss the motivations for this decision.

If we are working in a restricted domain (such as on an industrial assembly line) the numbers and types of objects in the scene are usually limited. Here, many details would be object-specific, and a goal-directed argument suggests that only the key differentiating features need be found. When the domain is sufficiently restricted, specific features will be unique signifiers. However, this would not be an appropriate strategy for a general vision system because, without additional descriptions or non-visual knowledge of the restricted domain, it would not ordinarily be possible to reach the stage where only a few identities were under consideration.

However, identifying individuals or subclasses requires finer details (e.g. distinguishing between two people, or even between two "identical" twins). Many individual objects differ only slightly or share identical features. Consider how often one recognizes a facial feature or a smile of a friend in the face of a complete stranger. Though the stranger is unique through the configuration of his/her features, some details are held in common with the friend. If recognition were predicated on only a few features, which may sometimes be sufficient for unique identification in a limited domain, then we would be continually misrecognizing objects. While only a few may be necessary for model invocation, many others are necessary for confirmation.

These problems suggest that the hypothesis construction process should try to find direct image evidence for all model features.

On the other hand, partial evidence is often sufficient. We usually have no trouble identifying a friend even when a mustache has been shaved off, and often do not even notice that there is a change, let alone know what the change is. Or we can often recognize them, having seen only a portion of their face.

Moreover, finding evidence for all features is usually impossible, as resolution changes might make the information too large or too small to directly detect, and occlusion will hide some of it.

Yet, we tend to expect recognition to be perfect. So, on idealistic grounds, a general vision system should acquire as much information as possible. This is also supported by the usual role of a general vision system - that it should be a largely autonomous, data-driven analysis system, providing environmental descriptions to a higher-level action module, which may then instigate additional goal-directed visual analysis.

In summary, our desire for full model instantiation derives from:

a philosophical requirement - that true image understanding requires consistent interpretation of all visible features relative to a model and contingent explanation of missing features,
an environmental requirement - that many details are needed to distinguish similar objects, especially as objects share common features and some details will be absent for environmental reasons (e.g. occlusion), and
a practical requirement - that objects should be recognized to the degree they need to be distinguished.

Why Use Surfaces as Evidence

What is desired is image evidence that supports the existence of each model feature. In edge-based recognition systems, an image edge was the key evidence for a model feature, because surface orientation discontinuity boundaries were observed as edges. This was even more important in polyhedral domains (without reflectance boundaries), where extremal boundaries were also orientation discontinuity boundaries. Unfortunately, more naturally shaped and colored objects led to a veritable plethora of problems: there were fewer traditional orientation edges, extremal boundaries no longer corresponded to orientation discontinuities and reflectance and illumination variations created new edges. So, these made the search for simple and directly corresponding edge evidence much more difficult.

Two of the advantages of using surfaces given in Chapter 3 are mentioned here again:

using surfaces as the primary representational unit of both the raw data and the object model makes the transformation distance between the two almost non-existent, and
the interpretation of a surface data unit is unambiguous (unlike image edges, which may correspond to a variety of scene phenomena).

With surface representations, it is again possible to find image evidence that directly matches with model features. Assuming that there is a consistent segmentation regimen for both the surface image and model SURFACEs, the model feature instantiation problem can be reduced to finding which model SURFACE corresponds to each data surface.

Finding all model features first requires understanding how three dimensional objects appear in images - to locate image evidence for oriented model instances. Here the recognition process must understand, or at least be able to predict how a surface patch's appearance varies with changes in the surface's position relative to the viewer. The segmentation process attempts to produce surface patches with a uniform curvature characterization, so it is easy to approximate the visible shape to first-order, given the model patch and its relative position. Also, given recent advances in computer graphics, it is possible to deduce the visibility status of most surface patches.

Another result of using the surface segmentation is a discrete symbolic partitioning of the complete object surface. This simplifies the surface matching computation tremendously. An infinitesimal element of a surface could have many possible identities and this shows up in practice as the need to rotate incrementally and shift surfaces when matching (e.g. [96,135]). A segmented surface immediately simplifies the matching by choosing a higher level structure for comparison. Topology further decreases the amount of matching as adjacent model SURFACEs must pair with adjacent data surfaces, reducing the problem to subgraph isomorphism. If the invocation process gives strong suggestions to the identity of the various surfaces, then combinatorial matching is almost completely unnecessary.

Next: Deducing Object Position Up: Hypothesis Construction Previous: Hypothesis Construction

Bob Fisher 2004-02-26