Next: Related Work Up: Many-to-Many Feature Matching Using Previous: Many-to-Many Feature Matching Using

Introduction

The problem of object recognition is often formulated as that of matching configurations of image features to configurations of model features. Such configurations are often represented as vertex-labeled graphs, whose nodes represent image features (or their abstractions), and whose edges represent relations (or constraints) between the features. For scale-space structures, represented as directed graphs, relations can represent both parent/child relations as well as sibling relations. To match two graph representations (hierarchical or otherwise) means to establish correspondences between their nodes. To evaluate the quality of a match, an overall distance measure is defined, whose value depends on both node and edge similarity. Previous work on graph matching has typically focused on the problem of finding a one-to-one correspondence between the vertices of two graphs. However, the assumption of one-to-one correspondence is a very restrictive one, for it assumes that the primitive features (nodes) in the two graphs agree in their level of abstraction. Unfortunately, there are a variety of conditions that may lead to graphs that represent visually similar image feature configurations yet do not contain a single one-to-one node correspondence. The limitations of the one-to-one assumption are illustrated in Figure 1, in which an object is decomposed into a set of ridges and blobs extracted at appropriate scales [19]. The ridges and blobs map to nodes in a directed graph, with parent/child edges directed from coarser scale nodes to overlapping finer scale nodes, and sibling edges between nodes that share a parent. Although the two images clearly contain the same object, the decompositions are not identical. Specifically, the ends of the fingers in the right hand have been over-segmented with respect to the left hand. It is quite common that due to noise or segmentation errors, a single feature (node) in one graph can correspond to a collection of broken features (nodes) in another graph. Or, due to scale differences, a single, coarse-grained feature in one graph can correspond to a collection of fine-grained features in another graph. Hence, we seek not a one-to-one correspondence between image features (nodes), but rather a many-to-many correspondence.

**Figure 1:** The Need for Many-to-Many Matching. In the two images, the two objects are similar, but the extracted features are not necessarily one-to-one. Specifically, the ends of the fingers in the left hand have been over-segmented in the right hand.
$\begin{figure} \centerline{ \begin{tabular}{cc} \epsfysize=2.25in \epsfbox{five.eps} & \epsfysize=2.25in \epsfbox{five1.eps} \end{tabular}} \end{figure}$

In [10,7], we presented a framework for many-to-many matching of undirected graphs and directed graphs, respectively, where features and their relations were represented using edge-weighted graphs. The method began with transforming a graph into a metric tree. Next, using the graph embedding technique of Matousek [13], the tree was embedded into a normed vector space. This two-step transformation allowed us to reduce the problem of many-to-many graph matching to a much simpler problem of matching weighted distributions of points in a normed vector space. To compute the distance between two weighted distributions, we used a distribution-based similarity measure, known as the Earth Mover's Distance under transformation. The previous procedure suffered from a significant limitation. Namely, each graph was embedded into a vector space of arbitrary dimensions, and before the embeddings could be matched, a dimensionality reduction step was required, which was both costly and prone to error. Specifically, we used an inefficient Principal Components Analysis (PCA)-based method to project the two distributions into the same normed space. In this paper, we present an entirely different embedding method based on a spherical coding algorithm. This efficient (linear-time) method embeds metric trees into vector spaces of prescribed dimensionality, precluding the need for a dimensionality reduction step. We demonstrate the framework on the problem of multi-scale shape matching, in which an image is decomposed into a set of blobs and ridges with automatic scale selection.

Next: Related Work Up: Many-to-Many Feature Matching Using Previous: Many-to-Many Feature Matching Using