Interpretation
Tree Search

Tree searching is a standard topic in Computer Science; examples of algorithms include depth, breadth, best-first and search. In the context of computer vision, we can define a search tree which has as its nodes pair of matching primitives in the model and scene data, and as its arcs a representation of consistency of interpretation between nodes. By tracing back from the leaves of the tree we can determine a consistent interpretation.

First, we can define a basic interpretation tree approach (Grimson, 1990). With reference to Figure 2,below, the problem is to match the line primitives in the model, {1, 2, 3} to those in the scene, {a, b, c}. Select a scene feature at random, feature a, say. Choose a model feature at random. The choice (a, 1) represents a node in the tree. However, we could equally choose (a, 2) or (a, 3) as initial nodes. Thus there are three nodes at the first level of the tree.

Now expand each of these nodes. For example, if we choose to expand (a, 1) then the three children would be defined as (b, 1), (b, 2) and (b, 3). If we expand (a, 2) then the children are the same. Hence, for a completely unconstrained tree search matching a model of n primitives to a scene having n primitives there will n nodes at the first level, at the second level and so on until there are nodes at the last level.

Figure 2:A simple matching problem

In general, we shall deal with constrained tree search. For example, is a scene labelling of
{(a, 3), (b, 3), (c,3)} sensible ? Well it suggests that we can detect in the scene the hypoteneuses of three separate triangle, and that the other sides are occluded or otherwise undetected. Suppose we know a-priori that there is only one triangle in the scene ? Then, at the second level of the search tree we can only expand (a, 1) with (b, 2) and (b, 3); this a uniqueness constraint by analogy with the stereo matching problem. Hence for each of n nodes at the first level, there are n-1 children, then n-2 children and so on.

To reduce the combinatorics of the search still further, we should add additional constraints, similar to the room labelling problem we illustrated in Figure 1: Labelling the regions of a room consistently. As we are confining our intention to rigid geometric objects, then our constraints should be based on geometry. Unary constraints apply to single pairings between model and scene features. For example we could introduce a constraint which says that lines can only be matched if they have the same length. Binary or pairwise constraints are based on pairs if features. For example, we could introduce a constraint based on the angle between pairs of lines. If (a, 1) is an existing pairing, and (b, 2) is a proposed additional node, then this node is only consistent with (a, 1) if the angle between a and b is the same as that between 1 and 2. If not, then the node cannot be added. As we expand the tree, then additional nodes may be added either on the basis of the parent node ( which makes more likely but does not ensure global consistency) or on the basis of pairwise comparisons with all ancestors. Alternatively, you might apply a n-ary check of global consistency of transformation between each scene and model feature; i.e. do they all have a common rotation and translation matrix ? It is necessary to balance the complexity of the tree expansion against the complexity of the checks on consistency.

Finally, we have not determined the order of expansion of the nodes, nor the termination condition. For example, we could apply a depth-first search based on the order of scene and model features. Expand a then b paired with 1 then 2 and so on. Alternatively we could apply breadth first search. If we apply a best-first approach we need some measure of the quality of match at each node in the tree. Intuitively, this might be based on the similarity of primitives and their n-ary relations; for example, at a given depth what is the residual error in the best fitting transformation between scene and model ? How similar are the several line length correspondences? Depth is to some extent a measure of ``best'', since it means that more feature correspondences have been found.

Best-first interpretation tree search, expanding each node as broadly as possible and enforcing consistency

```

Create search tree consisting of start node, S

Initialise Open List = {S}

REPEAT
REPEAT
Select  best node, Nj from Open List
Select next scene feature, si, for consideration
Select next model feature, mi, for consideration
IF (si, miconsistent then
Add node Nk = (si, mi) to search tree, expanding Nj
Add node Nk = (si, mi) to Open List
UNTIL (no consistent nodes can be added to Nj)
Remove Nj) from Open List
UNTIL (Termination Condition)  or (OPEN = {})
Output best solution by tracing back from best current node to root of tree
```

In an ideal world, termination would occur when all scene features (or all model features) have been matched. However, this is not possible. All model features may not be found in the scene because some are occluded or not detected. All scene features may not be found in the model because some are extra false features (e.g. shadows), or there may be more than one object present. Hence termination is almost always based on a heuristic; at worst it may be considered notas a solution, but a hypothesis, which can subsequently be verified by more detailed comparison between the model and the scene. In our simple examples, note that matching of primitives implies also a definition of pose, i.e. the rotation and translation required to align the model with the scene. Hence verification may involve back-projection of the model on to the scene and some kind of additional searching or processing of the image to verify the existence of hitherto absent features.

Figure 4, below, (Reference no 3, Wallace, 1988) shows an example of a search tree based on a unary length constraint and a pairwise geometric constraint based on angle and position of intercept. The model ( the simplest of 12 possibilities ) is a printer bracket (figure 3, below) lying in a stable pose, the scene is based on Hough transformation of an intensity image. The search tree is based on expansion of the ``best'' node, but a fixed breadth of up to 5 additional nodes is allowed. The search tree shown is for the correct solution. No other model achieves a significant depth of expansion.

 Figure 3: A picture of an Anadex printer bracket Figure 4: The search tree for the Anadex printer bracket

Comments to: Sarah Price at ICBL.