Next: Conclusion Up: Discussion and Conclusions Previous: Summary of Outstanding Problems

The IMAGINE II System

Experience with the IMAGINE I program has led to a redesign embodied in the IMAGINE II system. Though the re-implementation is not complete, the design of the system and its key representations and processes are summarized here. Figure 11.1 shows a block diagram of the main modules of the IMAGINE II system. The system is intended to interpret data deriving from scenes containing self and externally obscured complex, non-polyhedral man-made objects including possible degrees-of-freedom (e.g. robot joints).

As in IMAGINE I, data comes from a segmented $2\frac{1}{2}{\rm D}$ sketch, except that curve and volumetric scene features may be part of the input, too. The data may be fragmented or incomplete. The input data structure is a REV (Region, Edge, Vertex) graph. The system output is, as before, a list of object hypotheses with position and parameter estimates and a set of image evidence and justifications supporting the object hypothesis.

**Figure 11.1:** *IMAGINE II Modules and Data Structures*
$\begin{figure}\epsfysize =7.5in \epsfbox{FIGURES/Fig11.1.ps}\end{figure}$

Building the VSCP Structure

The first new representation is the VSCP structure (Volume, Surface, Curve, Point), which is constructed from the REV by knowledge-based structure completion processes. The goal of this process is to group curve and surface features from the REV to overcome fragmentation and occlusion effects and to remove non-structural artifacts (e.g. reflectance edges). The original raw data might be interrogated to help verify deductions.

An example of an occlusion rule is:

If two valid TEE junctions lying on the boundary of the same surface can be extended (using the local boundary shape) until they intersect, and the curve extensions lie behind closer surfaces, then hypothesize that the original shape of the partially obscured surface is that of the extended surface.

An example of a fragmentation rule is:

If two surface patches are "adjacent", have similar shape, depth and orientation and there are not intervening space curves (e.g. from patch edges or closer surfaces), then merge the two patches.

Here "adjacent" is a heuristic concept because the surface characterization is assumed to be neither complete nor dense (i.e. there may be missing surfaces and there might be substantial gaps between nearby patches).

Building the Contexts Structure

Invocation and matching occur in data contexts, only now contexts exist for curves and volumes as well as surfaces and surface clusters. Contexts improve matching efficiency by grouping related data and thereby isolating irrelevant data, and create a structure that can accumulate plausibility for model invocation.

The context structures are hierarchical in that contexts can be grouped to form larger contexts. Contexts are designed to support recognition of curves, surfaces, volumes and larger groupings of features. For example, the information contained in a surface context might link to both curve fragments and surface patches, because either might help define a complete surface.

Examples of context-forming rules are:

If a set of adjacent surface patches are completely isolated by depth discontinuity boundaries and there are no such boundaries internal to the group, then these surfaces form a context for recognizing an ASSEMBLY.
If a set of space curves roughly surrounds a region of two dimensional image space and the curves are not radically different in depth, then hypothesize that a surface context lies within the curves.

Structure Description

Model invocation and hypothesis construction require property estimates for image features, and because we are using 2 1/2D sketch data, three dimensional properties can be directly measured. These properties are similar to those used for IMAGINE I (Chapter 6), and include:

curve fragment properties: length, curvature, ...
surface fragment properties: area, curvature, elongation, ...
curve fragment pairs: relative orientation, relative size, ...
surface fragment pairs: relative orientation, relative size, ...

Model Invocation

Model invocation is nearly the same as in IMAGINE I (Chapter 8). A network implements the computation in a manner suitable for parallel evaluation. Nodes represent the pairing between individual model and data features, and are connected to other nodes according to the type of relation. Relations include: structural (e.g. "subcomponent of"), generic (e.g. "visual specialization of"), class (e.g. "non-visual specialization of"), inhibiting and general association. Direct evidence comes from a measure of the fit between data and model properties.

Object Models

The SMS models are used, as described in Chapter 7. They are primarily structural with model primitives designed to match with either curve, surface or volumetric data as alternatives. The models are hierarchical, building larger models from previously defined substructures. All model dimensions and reference frame transformations may involve variables and expressions, and algebraic constraints can bound the range of the variables.

The models have viewpoint dependent feature groups, which record the fundamentally distinct viewpoints of the object. They also identify (1) model features visible from the viewpoint and (2) new viewpoint dependent features (such as occlusion relationships, TEE junctions or extremal boundaries).

Hypothesis Construction

Initial selection of the model may come bottom-up from invocation or top-down as part of another hypothesis being constructed. Hypothesis construction then attempts to find evidence for all model features.

Feature visibility information comes from a viewpoint dependent feature group, which is selected according to the estimated orientation of the object.

Construction is largely hierarchical, grouping recognized subcomponents to form larger hypotheses. The most primitive features are designed to be recognized using either curve, surface or volumetric data, depending on what is available. At all stages, geometric consistency is required, which also results in more precise position estimates and estimates for embedded variables (such as a variable rotation angle about an axis).

Construction is a heuristic process whereby various approaches are tried to find evidence for a feature. For example, some heuristics for surface finding are:

Use an image patch if it has the predicted position, orientation, shape and size.
Use a smaller image patch if it has the predicted position, orientation and shape and no patch of the correct size is found (i.e. accept fragmented patches).
Do not look for the surface if it is small and far away.

Application of the heuristics is controlled through routines that know what approaches are available for finding features (and when to try them) and has somewhat of an "expert system" character.

Geometric Reasoning

The network-based geometric reasoning was described here. The geometric relationships between model features, model and data pairings and a priori scene knowledge are represented algebraically and are implemented as networks expressing the computational relationships between the variables.

Analysis of the types of geometric relationships occurring in scene analysis showed that most relationships could be expressed using only a small set of standard relationships (e.g. "a model point is paired with a data point"). The standard relationships are then be used to create standard network modules, which are allocated and connected as model matching produces new model-to-data pairings.

Agenda Management

To facilitate experimentation with different control regimes, the hypothesis construction processes are activated from a priority-ordered agenda. The processes take inputs from and return results to a global blackboard. An agenda item embodies a request for applying a specified hypothesis construction process on a given datum or hypothesis. The activated process may then enter other requests into the agenda. We use the agenda to implement a mixed control regime involving both top-down and bottom-up hypothesis construction.

Hypothesis Verification

Because data can be fragmented or erroneous, object hypotheses may be incomplete. Further, spurious hypotheses may be created from coincidental alignments between scene features. Hypothesis construction and geometric reasoning eliminate some spurious hypotheses, but other instances of global inconsistency may remain.

This module considers two problems: (1) global consistency of evidence (e.g. connectedness and proper depth ordering of all components) and (2) heuristic criteria for when to accept incomplete models.

Next: Conclusion Up: Discussion and Conclusions Previous: Summary of Outstanding Problems

Bob Fisher 2004-02-26