Next: Some existing matching algorithms Up: Computer Vision IT412 Previous: Problem definition

Subsections

Stereo matching

Approaches to the correspondence problem can be broadly classified into two categories: the intensity-based matching and the feature-based matching techniques. In the first category, the matching process is applied directly to the intensity profiles of the two images, while in the second, features are first extracted from the images and the matching process is applied to the features.

Intensity-based stereo matching

As shown in the previous section, the epipolar lines coincide with the horizontal scanlines if the cameras are parallel, the corresponding points in both images must therefore lie on the same horizontal scanline. Such stereo configurations reduce the search for correspondences from two-dimensions (the entire image) to one-dimension. In fact, a close look at the intensity profiles from the corresponding row of the image pair reveals that the two intensity profiles differ only by a horizontal shift and a local foreshortening. Fig. 5(a) and (b) depict the images taken with a camera that undergoes a displacement in the horizontal direction, the image pair therefore corresponds to a parallel camera set up. Two black lines are marked at rows 80 and 230 in both images. Fig. 5(c) and (d), (e) and (f), respectively show the intensity profiles of row 80 and row 230 of the two images.

**Figure 5:** A comparison of the intensity profiles of two images from a parallel stereo configuration. The image pair shown above is retrieved via ftp from the Calibrated Imaging Laboratory of CMU.
$\begin{figure} \begin{center} \ \begin{minipage} {5cm} (a) left image \\ \psfig... ...igure=inten_prof/plotrig230.ps,width=5cm} \end{minipage}\end{center}\end{figure}$

The similarity between the one-dimensional intensity profiles of the two images suggests an optimization process would be suitable. Indeed, Barn in 1987 attempted matching the parallel stereo images using simulated annealing. He defined an energy function E_ij as:

$\begin{displaymath} E_{ij} = \vert I_L\left(i,j\right)-I_R\left(i,j+D(i,j)\right) \vert + \lambda \vert\bigtriangledown D(i,j)\vert\end{displaymath}$

where $I_L\left(i,j\right)$ denotes the intensity value of the left image at the i-th row and j-th column and $I_R\left(i,k\right)$ denotes the intensity value of the right image at the same row but at the k-th column; D(i,j) is the disparity value (or horizontal shift in this case) at the ij-position of the left image.

The above is clearly a constrained optimization problem in which the only constraint being used is a minimum change of disparity values $D(i,j), \,\forall i, j$ . This constraint is commonly known as the continuity constraint (see Section ).

Robe later incorporated the use of a multiresolution scheme (see Section ) together with a smoothness constraint similar to that of Barn into the constrained optimization process. In addition to the horizontal shift of corresponding pixels, they also allowed the corresponding pixels to undergo vertical shift (i.e. disparity in the vertical direction), so their matching method is not restricted to only parallel stereo images. The energy function to be minimized, as expected, is more complicated than the one given above.

The advantage of this intensity profile matching is that a dense disparity map and, consequently a dense depth (or range) map, is output. Unfortunately, like all constrained optimization problems, whether the system would converge to the global minima is still an open problem, although, as reported by Robe92, the multiresolution scheme, to a certain extent, helped speed up convergence and avoid local minima.

An alternative approach in intensity-based stereo matching, commonly known as the window-based method, is to only match those regions in the images that are ``interesting'', for instance, regions that contain high variation of intensity values in the horizontal, vertical, and diagonal directions. The simple Moravec's interest operator (1979) detects such regions (correspond to regions that have grey-level corners) from the image pair, and it has been widely used in many stereo matching systems (e.g. the SRI STEREOSYS system (Hann, 1985)). After the interesting regions are detected, a simple correlation scheme is applied in the matching process; a match is assigned to regions that are highly correlated in the two images.

The problem associated with this window-based approach is that the size of the correlation windows must be carefully chosen. If the correlation windows are too small, the intensity variation in the windows will not be distinctive enough, and many false matches may result. If they are too large, resolution is lost, since neighbouring image regions with different disparities will be combined in the measurement. Worse, the two windows may not correlate unless the disparity within the windows is constant, which suggests that the multiresolution scheme is again appropriate. Unfortunately, the most serious shortcoming of the window-based approach -- its sensitivity to the differences in foreshortening -- may sometimes render the approach useless. Fig. 6 shows a segment MN in the scene

**Figure 6:** Foreshortening due to the change of viewing position and direction.
$\begin{figure} \begin{center} \ \psfig {figure=foreshorten.xfigps,height=5cm} \end{center}\end{figure}$

projecting onto $\Pi$ and $\Pi'$ at segments $\overline{mn}$ and $\overline{m'n'}$ , respectively. Because of the large difference between the orientations of the retinal planes and the scene plane, segment $\overline{mn}$ is much longer than segment $\overline{m'n'}$ .The windows in the two images would only give the best correlation measure if the window used in $\Pi$ has the size of $\overline{mn}$ and the window used in $\Pi'$ has the size of $\overline{m'n'}$ .Such variation of window size to compensate foreshortening is not possible without the knowledge of the scene. This appears to pose a chicken-and-egg problem, since the whole point is to recover shape using correlation matching.

Feature-based stereo matching

In the feature-based approach, the image pair is first preprocessed by an operator so as to extract the features that are stable under the change of viewpoint, the matching process is then applied to the attributes associated with the detected features. The obvious question here is what type of features that one should use? Edge elements, corners, line segments, and curve segments are features that are robust against the change of perspective, and they have been widely used in many stereo vision work. Edge elements and corners are easy to detect, but may suffer from occlusion; line and curve segments require extra computation time, but are more robust against occlusion (they are longer and so are less likely to be completely occluded). Higher level image features such as circles, ellipses, and polygonal regions have also been used as features for stereo matching, these features are, however, restricted to images of indoor scenes.

Most feature-based stereo matching systems are not restricted to using only a specific type of features, instead, a collection of feature types is incorporated. For instance, the system proposed by Weng in 1988 combines intensity, edges, and corners to form multiple attributes for matching; Lim and Binford (1987), on the other hand, used a hierarchy of features varying from edges, curves, to surfaces and bodies (2-D regions) for high-level attribute matching.

Types of features

edge elements: There exist many edge operators for finding edge elements from an image. For example, the $\bigtriangledown^2 G$ operator followed by a detection of zero-crossings; the Canny edge detector (1986).
The attributes of edge elements used for matching can be: coordinates (location in the image), local orientations (or directions), local intensity profile on either side of the edge elements (e.g. from dark to light, or from light to dark).
corners: The earliest corner detector is probably Beaudet's (1978) rotationally invariant operator called DET; corner detectors reported in the 80s include Dreschler and Nagel (1982), Kitchen and Rosenfeld (1982), Zuniga and Haralick (1983), etc. The Harris corner detector (1988) is one of the popular corner detectors that are widely used today in Oxford and INRIA.
Attributes of corners that can be used for matching: coordinates of corners, type of junctions that the corners correspond to (e.g. Y-junction, L-function, A-junction, etc.).
line segments: To extract line segments from an image, an edge operator must first be applied. Line segments are then formed by a linking and merging operation on the detected edge elements based on some criteria such as distance, similarity, and collinearity measures. Line segment extraction algorithms that have been reported include: Nevatia and Babu (1980), Fischler and Bolles (1983), Weiss and Boldt (1986).
Attributes of line segments that can be used for matching: coordinates of end-points, mid-points, orientation of line segments.
It should be noted that, due to image noise, the end-points (and thus the mid-points also) of line segments are normally not reliably detected, stereo matching process that relies on the coordinates of these points do not produce good reconstruction of 3-D coordinates. In fact, for a pair of matching line segments, any point on the first line segment can correspond to every other point on the second line segment, and this ambiguity can only be resolved if the end-points of the two line segments are known exactly. Using line segments as features for matching also has the following drawbacks:
- only those line segments that have a significant difference in direction with the epipolar lines can be matched. For line segments whose directions are close to those of the epipolar lines, any translation of the line segments along the line segments' directions does not cause a significant change to the epipolar geometry, yet the subsequent reconstruction process would be greatly affected.
- for a given line segment in one image, its corresponding line segment is likely to be broken into two or more shorter line segments, thus the uniqueness constraint (see Section ) may not be applicable.
- the reconstruction of 3-D lines from corresponding 2-D line segments is not well-defined if the end-points of the 2-D line segments are not reliable. This problem can be partially overcome by incorporating an uncertainty measure to the 2-D line segments' end-points as suggested by Zhang (1995).
curve segments: The matching of curve segments has not been widely attempted, the reason is probably due to the ambiguity involved -- every point on a curve is likely to be matchable with every other point on another curve. Deriche and Faugeras' work (1990) is one of the very few that has been reported. They proposed to match the turning points of curves.
circles, ellipses: These features are present mainly in indoor scenes and applicable to detection of defects on industrial parts.
Attributes that can be used for matching: areas in pixel units, coordinates of the centre of the geometric figures.
regions: Regions can be either defined as blobs (e.g. detected by a region growing algorithm) in the image or defined as polygonal regions bounded by line segments. Regions in the form of blobs have irregular boundary and may not match perfectly with regions from another image.
For polygonal regions, attributes that can be used for matching include: areas of regions, bounding line segments of regions, locations of regions' centroids.
Polygonal regions are very high-level features and could be costly to extract.

Matching constraints

Stereo matching process is a very difficult search procedure. In order to minimum false matches, some matching constraints must be imposed. Below is a list of the commonly used constraints.

Similarity (or compatibility Grimson, 1981). For the intensity-based approach, the matching pixels must have similar intensity values (i.e. differ lower than a specified threshold) or the matching windows must be highly correlated. For the feature-based approach, the matching features must have similar attribute values.
Uniqueness Marr and Poggio, 1979. Almost always, a given pixel or feature from one image can match no more than one pixel or feature from the other image.
As noted earlier, the uniqueness constraint may not be applicable to the line segment-based approach. This constraint can also fail if transparent objects are present in the scene. Furthermore, given a pixel or feature m in one image, its ``corresponding'' pixel or feature may be occluded in the other image. In this case, no match should be assigned to m.
Continuity Marr and Poggio, 1979. The cohesiveness of matters suggests that the disparity of the matches should vary smoothly almost everywhere over the image.
This constraint fails at discontinuities of depth, for depth discontinuities cause an abrupt change in disparity.

Ordering Baker and Binford, 1981. If $m \leftrightarrow m'$ and $n \leftrightarrow n'$ and if m is to the left of n then m' should also be to the left of n' and vice versa. That is, the ordering of features is preserved across images.

The ordering constraint fails at regions known as the forbidden zone (Fig. 7).

**Figure 7:** The ordering constraint fails if a given 3-D point (N here) falls onto the forbidden zone of another 3-D point (M). In the left image ( $\Pi$ ), m is to the right of n, but in the right image ( $\Pi'$ ), this ordering is reversed.
$\begin{figure} \begin{center} \ \psfig {figure=forbidden.xfigps,height=6cm} \end{center}\end{figure}$

Epipolar. Given a feature point m in the left image, the corresponding feature point m' must lie on the corresponding epipolar line.
This constraint reduces the search space from two-dimensions to one-dimension. Unlike all the other constraints, the epipolar constraint would never fail and could be applied reliably once the epipolar geometry is known (this is possible after the fundamental matrix is estimated, see Appendix B).
By introducing one more camera into the system, the ambiguity involved in matching can be further reduced: Given a feature point $m \in \Pi$ and potential matches $m' \in \Pi'$ and $m'' \in \Pi''$ , an epipolar line ${\bf l}_{13} \in \Pi''$ can be constructed using m and the epipolar geometry between $\Pi$ and $\Pi''$ , another epipolar line ${\bf l}_{23} \in \Pi''$ can also be constructed using m' and the epipolar geometry between $\Pi'$ and $\Pi''$ . The two epipolar lines ${\bf l}_{13}$ and ${\bf l}_{23}$ must intersect at m'' if $m \leftrightarrow m' \leftrightarrow m''$ .
Relaxation. A global matching constraint to eliminate false matches.
An example (Barnard and Thompson, (1980)) of applying relaxation to stereo matching: assign to each candidate match a probability value based on some criteria on the ``goodness of match''; iteratively update this probability value for each candidate match; finally, delete those matches whose probability value is below a certain threshold. The update process in each iteration is as follows: increase probability of a given candidate match $m \leftrightarrow m'$ by a value that is proportional to the number of neighbouring matches that have consistent disparity values with $m \leftrightarrow m'$ .
Other criteria may also be used in the update process, e.g. the geometric support as proposed by Ayache and Faverjon (1987).

Coarse-to-fine multiresolution matching scheme

The coarse-to-fine multiresolution matching scheme works as follows:

generate a pair of image pyramids (a hierarchy of images) from the original image pair so that only few and prominent features are present at the coarse levels. The original images are at the finest level of the image pyramids.
start the matching process at the coarsest level.
use the matches obtained at the coarser level to guide the matching process gradually up to the finest level.

**Figure 8:** By applying to an image Laplacian of Gaussian filters of different $\sigma$ values followed by a detection of zero-crossings, a hierarchy of edge map from coarse to fine levels can be obtained.
$\begin{figure} \begin{center} \ \begin{minipage} {5cm} (a) original image \\ \p... ...{figure=lapgauss/cameraman6.ps,width=5cm} \end{minipage}\end{center}\end{figure}$

An image pyramid of edges can be obtained by convolving the images with a Laplacian of Gaussian filter of different widths ( $\sigma$ )followed by a detection of zero-crossings (Fig. 8). An image pyramid of grey level images can be obtained by applying a smoothing operation followed by sub-sampling (i.e. reducing the image resolution by a factor of 2). Such image pyramid is sometimes referred to as the processing cone (Fig. 9). The SRI STEREOSYS system described later used this smooth and sub-sample scheme.

**Figure 9:** Processing cone.
$\begin{figure} \begin{center} \ \psfig {figure=proc-cone.xfigps,height=7.5cm} \end{center}\end{figure}$

Next: Some existing matching algorithms Up: Computer Vision IT412 Previous: Problem definition

Robyn Owens
10/29/1997