Next: Metric Embedding of Graphs Up: Many-to-Many Feature Matching Using Previous: Related Work


Notation and Definitions

Before describing our many-to-many matching framework, some definitions are in order. To begin, a graph $G$ is a pair $({\cal A},
E)$, where ${\cal A}$ is a finite set of vertices and $E$ is a set of connections (edges) between the vertices. An edge $e=(u,v)$ consists of two vertices such that $u,v \in {\cal A}$. A graph $G=({\cal A}, E)$ is edge-weighted, if each edge $e \in E$ has a weight, ${\cal W}(e) \in
{\mathbb{R}}$. Let $G=({\cal A}, E)$ denote an edge-weighted graph with real edge weights ${\cal W}(e)$, $e \in E$. We will say that ${\cal D}$ is a metric for $G$ if, for any three vertices $u,v,w\in {\cal A}$, ${\cal D}(u,v)={\cal D}(v,u) \ge 0$, with ${\cal D}(u,v) = 0$ if and only if $u=v$, and ${\cal D}(u,v)\le {\cal D}(u,w)+{\cal D}(w,v)$. One way of defining metric distances on a weighted graph is to use the shortest-path metric $\delta(.,.)$ on the graph or its subgraphs, i.e., ${\cal D}(u,v)=\delta(u,v)$, the shortest path distance between $u$ and $v$ for all $u,v \in {\cal A}$. We will say that the edge weighted tree ${\mathfrak{T}}={\mathfrak{T}}_G({\cal A}',E')$ is a tree metric for $G$, with respect to distance function ${\cal D}$, if for any pair of vertices $u,v$ in $G$, the length of the unique path between them in ${\mathfrak{T}}$ is equal to ${\cal D}(u,v)$. An ultra-metric is a special type of tree metric defined on rooted trees, where the distance to the root is the same for all leaves in the tree, an approximation that introduces small distortion. A metric ${\cal D}$ is an ultra-metric if, for all points $x,y,z$, we have ${\cal D}[x,y]\le \max\{{\cal D}[x,z],{\cal D}[y,z]\}.$ Unfortunately, an ultra-metric does not satisfy all the properties of a tree metric distance. To create a general tree metric from an ultra-metric, we need to satisfy the 4-point condition (see [4]): ${\cal D}[x,y]+{\cal D}[z,w] \le \max
\{{\cal D}[x,z]+{\cal D}[y,w] ,{\cal D}[x,w]+{\cal D}[y,z]\}$, for all $x,y,z,w$. A metric that satisfies the 4-point condition is called an additive metric. A metric embedding is a mapping $f:{\cal A}\rightarrow {\cal B}$, where ${\cal A}$ is a set of points in the original metric space, with distance function ${\cal D}(.,.)$, ${\cal B}$ is a set of points in the (host) $d$-dimensional normed space $\vert\vert.\vert\vert _k$, and for any pair $p,q\in {\cal A}$ we have


for a certain parameter $c$, known as the distortion. Intuitively, such an embedding will enable us to reduce problems defined over difficult metric spaces, $({\cal A},{\cal D})$, to problems over easier normed spaces, $({\cal B},\vert\vert.\vert\vert _k)$. As can be observed from Equation 1, the distortion parameter $c$ is a critical characteristic of embedding $f$, i.e., the closer $c$ is to $1$, the better the target set ${\cal B}$ mimics the original set ${\cal A}$. To capture the topological structure of a tree, we use the concept of caterpillar decomposition and caterpillar dimension. We illustrate the caterpillar decomposition of a rooted tree with no edge weights in Figure 2. The three darkened paths from the root represent three edge-disjoint paths, called level 1 paths. If we remove these three level 1 paths from the tree, we are left with the 7 dashed, edge-disjoint paths. These are the level 2 paths, and if removing them had left additional connected components, the process would be repeated until all the edges in the tree had been removed. The union of the paths is called the caterpillar decomposition, denoted by ${\mathfrak{P}}$, and the number of levels in ${\mathfrak{P}}$ is called the caterpillar dimension, denoted by $m$.

Figure 2: Path partition of an example tree. The three level 1 paths are bold, while the seven level 2 paths are dashed (there are no other levels in this particular case - see text).

The caterpillar decomposition ${\mathfrak{P}}$ can be constructed using a modified depth-first search in linear time. Given a caterpillar decomposition ${\mathfrak{P}}$ of ${\mathfrak{T}}$, we will use $L$ to denote the number of leaves of ${\mathfrak{T}}$, and let $P(v)$ represent the unique path between the root and a vertex $v\in {\cal A}$. The first segment of $P(v)$ of weight $l_1$ follows some path $P^1$ of level 1 in ${\mathfrak{P}}$, the second segment of weight $l_2$ follows a path $P^2$ of level 2, and the last segment of weight $l_\alpha $ follows a path $P^\alpha $ of level $\alpha \le m$. The sequences $\left< P^1,\ldots,P^\alpha \right>$ and $\left<l_1,\ldots,l_\alpha \right>$ will be referred to as the decomposition sequence and the weight sequence of $P(v)$, respectively. Finally, we introduce the notion of spherical codes in our embedding procedure. A spherical code is the distribution of a finite set of $n$ points on the surface of a unit sphere such that the minimum distance between any pair of points is maximized [6]. Equivalently, one can try to minimize the radius $r$ of a $d$-dimensional sphere such that $n$ points can be placed on the surface, where any two of the points are at angular distance 2 from each other. Recall that the angular distance between two points is the acute angle subtended by them at the origin.

Next: Metric Embedding of Graphs Up: Many-to-Many Feature Matching Using Previous: Related Work