Next: References

Contour matching: problem and related work

Yoram Gdalyahu and Daphna Weinshall
Institute of Computer Science, The Hebrew University
91904 Jerusalem, Israel
email: {yoram,daphna}@cs.huji.ac.il

Contour matching is an important problem in computer vision with a variety of applications, including model based recognition, depth from stereo and tracking. In these applications the two matched curves are usually very similar. For example, a typical application of curve matching to model based recognition would be to decide whether a model curve and an image curve are the same, up to some scaling or 2D rigid transformation and some permitted level of noise. Contour matching is also needed for applications like image database categorization, where images contain dominant curves or silhouettes. For such applications it is crucial to achieve a reliable similarity estimation even between images that are only weakly related; this requires flexible matching, which can support graded similarity estimation.

In the following literature review we first make the distinction between the majority of the boundary matching methods which focus on the curve itself, and the dual approach which is based on the curve's medial axis. In the latter approach, a medial axis together with singularities labeling form a shock graph representation, and matching shock graphs is an isomorphism problem. The methods for solving it includes semi-definite programming [36], replicator dynamics [30], graduated assignment [34] and syntactic graph matching [38]. In some of these cases the matching is only structural, while in others two levels of matching (structural and metrical) are supported. The methods based on shock graphs succeed to define a graded similarity measure, and may be combined with suitable database indexing [35].

Most of the work on contour matching involves direct curve matching, where we further distinguish between dense matching and feature matching. Dense matching is usually formulated as a parameterization problem, with some cost function to be minimized. The cost might be defined as the ``elastic energy'' needed to transform one curve to the other [5,8,10], but other alternatives exist [2,16,12,28]. The main drawbacks of these methods are their high computational complexity (which is reduced significantly if only key points are matched), and the fact that they are usually not invariant under both 2D rotation and scaling. In addition, the computation of elastic energy (which is defined in terms of curvature) is scale dependent, and requires accurate evaluation of second order derivatives.

Feature matching methods may be divided into three groups: proximity matching, spread primitive matching, and syntactic matching. The idea behind proximity matching methods is to search for the best matching while permitting the rotation, translation and scaling (to be called alignment transformation) of each curve, such that the distances between matched key points are minimized [4,20,21,42]. Consequently these methods are rather slow; moreover, if scaling is permitted an erroneous shrinking of one feature set may result, followed by the matching of the entire set with a small number of features from the other set. One may avoid these problems by excluding many-to-one matches and by using the points order, but then the method becomes syntactic (see below). Moreover, proximity matching is not adequate for weakly similar curves. As an alternative to the alignment transformation, features may be mapped to an intrinsic invariant coordinate frame [26,32,33]; the drawback of this approach is that it is global, as the entire curve is needed to correctly compute the mapping.

Features can be used to divide the curves into shape elements, or primitives. If a single curve can be decomposed into shape primitives, the matching algorithm should be constrained to preserve their order. But in the absence of any ordering information (like in stereo matching of many small fragments of curves), the matching algorithm may be called ``spread primitive matching''. In this category we find algorithms that seek isomorphism between attributed relational graphs [6,24,9], and algorithms that look for the largest set of mutually compatible matches. Here, compatibility means an agreement on the induced coordinate transformation, and a few techniques exist to find the largest set of mutually compatible matches (e.g., clustering in Hough space [37], geometrical hashing [23], and clique finding in an association graph [7,11,19,22]). Note that at the application level, finding isomorphism between attributed relational graphs is the same problem as finding isomorphism between shock graphs (discussed above), although in the last case an additional constraint may apply [30].

For the purpose of matching complex 1D curves, it is advantageous to use the natural order of primitives. This results in a great simplification, and there is no need to solve the difficult graph isomorphism problem. Moreover, the relations encoded by the attributed relational graphs need to be invariant with respect to 2D image transformations, and as a result they are usually non local.

A syntactical representation of a curve is an ordered list of shape elements, having attributes like length, orientation, bending angle etc. Hence, many syntactical matching methods are inspired by efficient and well known string comparison algorithms, which use edit operations (substitution, deletion and insertion) to transform one string to the other [43,27,18]. The vision problem is different from the string matching problem in two major aspects, however: first, in vision invariance to certain geometrical transformations is desired; second, a resolution degradation (or smoothing) may create a completely different list of elements in the syntactical representation.

There are no pure syntactic algorithms available which satisfactorily solve both of these problems. If invariant attributes are used, the first problem is immediately addressed, but then the resolution problem either remains unsolved [1,17,25] or may be addressed by constructing for each curve a cascade of representations at different scales [3,29,41]. Moreover, invariant attributes are either non-local (e.g., length that is measured in units of the total curve length), or they are non-interruptible. Using variant attributes is less efficient, but provides the possibility to define a merge operator which can handle noise [31,39,40], and might be useful (if correctly defined) in handling resolution change. However, the methods using variant attributes could not ensure rotation and scale invariance.

In [13,14,15] a local syntactic matching method is discussed, which can cope with both occlusion and irrelevant changes due to image transformation, while using variant attributes. These attributes support a simple smoothing mechanism, hence the algorithm handle true scale (resolution) changes. The method is efficient and fast, using a variant of the edit matching procedure combined with heuristic search. As in [1], this method combines syntactic matching with a proximity measure; feature correspondence is first established using syntactic matching, and then curve dissimilarity is evaluated according to the residual distances between matched points. The algorithm is designed to give a graded similarity value, where low values reflect similarity between weakly similar curves and higher values indicate strong similarity; thus it is most suitable for applications such as image database categorization and indexing.

Next: References

Bob Fisher
Sat Sep 4 21:47:57 BST 1999