next up previous
Next: Some existing matching algorithms Up: Computer Vision IT412 Previous: Problem definition

Subsections

Stereo matching

 Approaches to the correspondence problem can be broadly classified into two categories: the intensity-based matching and the feature-based matching techniques. In the first category, the matching process is applied directly to the intensity profiles of the two images, while in the second, features are first extracted from the images and the matching process is applied to the features.

Intensity-based stereo matching

 As shown in the previous section, the epipolar lines coincide with the horizontal scanlines if the cameras are parallel, the corresponding points in both images must therefore lie on the same horizontal scanline. Such stereo configurations reduce the search for correspondences from two-dimensions (the entire image) to one-dimension. In fact, a close look at the intensity profiles from the corresponding row of the image pair reveals that the two intensity profiles differ only by a horizontal shift and a local foreshortening. Fig. 5(a) and (b) depict the images taken with a camera that undergoes a displacement in the horizontal direction, the image pair therefore corresponds to a parallel camera set up. Two black lines are marked at rows 80 and 230 in both images. Fig. 5(c) and (d), (e) and (f), respectively show the intensity profiles of row 80 and row 230 of the two images.


  
Figure 5: A comparison of the intensity profiles of two images from a parallel stereo configuration. The image pair shown above is retrieved via ftp from the Calibrated Imaging Laboratory of CMU.
\begin{figure}
\begin{center}
\
\begin{minipage}
{5cm}
(a) left image \\ 
\psfig...
 ...igure=inten_prof/plotrig230.ps,width=5cm}
\end{minipage}\end{center}\end{figure}

The similarity between the one-dimensional intensity profiles of the two images suggests an optimization process would be suitable. Indeed, Barn in 1987 attempted matching the parallel stereo images using simulated annealing. He defined an energy function Eij as:

\begin{displaymath}
E_{ij} = \vert I_L\left(i,j\right)-I_R\left(i,j+D(i,j)\right) \vert
 + \lambda \vert\bigtriangledown D(i,j)\vert\end{displaymath}

where $I_L\left(i,j\right)$ denotes the intensity value of the left image at the i-th row and j-th column and $I_R\left(i,k\right)$ denotes the intensity value of the right image at the same row but at the k-th column; D(i,j) is the disparity value (or horizontal shift in this case) at the ij-position of the left image.

The above is clearly a constrained optimization problem in which the only constraint being used is a minimum change of disparity values $D(i,j), \,\forall i, j$. This constraint is commonly known as the continuity constraint (see Section [*]).

Robe later incorporated the use of a multiresolution scheme (see Section [*]) together with a smoothness constraint similar to that of Barn into the constrained optimization process. In addition to the horizontal shift of corresponding pixels, they also allowed the corresponding pixels to undergo vertical shift (i.e. disparity in the vertical direction), so their matching method is not restricted to only parallel stereo images. The energy function to be minimized, as expected, is more complicated than the one given above.

The advantage of this intensity profile matching is that a dense disparity map and, consequently a dense depth (or range) map, is output. Unfortunately, like all constrained optimization problems, whether the system would converge to the global minima is still an open problem, although, as reported by Robe92, the multiresolution scheme, to a certain extent, helped speed up convergence and avoid local minima.

An alternative approach in intensity-based stereo matching, commonly known as the window-based method, is to only match those regions in the images that are ``interesting'', for instance, regions that contain high variation of intensity values in the horizontal, vertical, and diagonal directions. The simple Moravec's interest operator (1979) detects such regions (correspond to regions that have grey-level corners) from the image pair, and it has been widely used in many stereo matching systems (e.g. the SRI STEREOSYS system (Hann, 1985)). After the interesting regions are detected, a simple correlation scheme is applied in the matching process; a match is assigned to regions that are highly correlated in the two images.

The problem associated with this window-based approach is that the size of the correlation windows must be carefully chosen. If the correlation windows are too small, the intensity variation in the windows will not be distinctive enough, and many false matches may result. If they are too large, resolution is lost, since neighbouring image regions with different disparities will be combined in the measurement. Worse, the two windows may not correlate unless the disparity within the windows is constant, which suggests that the multiresolution scheme is again appropriate. Unfortunately, the most serious shortcoming of the window-based approach -- its sensitivity to the differences in foreshortening -- may sometimes render the approach useless. Fig. 6 shows a segment MN in the scene

  
Figure 6: Foreshortening due to the change of viewing position and direction.
\begin{figure}
\begin{center}
\

\psfig {figure=foreshorten.xfigps,height=5cm}
\end{center}\end{figure}

projecting onto $\Pi$ and $\Pi'$ at segments $\overline{mn}$and $\overline{m'n'}$, respectively. Because of the large difference between the orientations of the retinal planes and the scene plane, segment $\overline{mn}$ is much longer than segment $\overline{m'n'}$.The windows in the two images would only give the best correlation measure if the window used in $\Pi$ has the size of $\overline{mn}$and the window used in $\Pi'$ has the size of $\overline{m'n'}$.Such variation of window size to compensate foreshortening is not possible without the knowledge of the scene. This appears to pose a chicken-and-egg problem, since the whole point is to recover shape using correlation matching.

Feature-based stereo matching

 In the feature-based approach, the image pair is first preprocessed by an operator so as to extract the features that are stable under the change of viewpoint, the matching process is then applied to the attributes associated with the detected features. The obvious question here is what type of features that one should use? Edge elements, corners, line segments, and curve segments are features that are robust against the change of perspective, and they have been widely used in many stereo vision work. Edge elements and corners are easy to detect, but may suffer from occlusion; line and curve segments require extra computation time, but are more robust against occlusion (they are longer and so are less likely to be completely occluded). Higher level image features such as circles, ellipses, and polygonal regions have also been used as features for stereo matching, these features are, however, restricted to images of indoor scenes.

Most feature-based stereo matching systems are not restricted to using only a specific type of features, instead, a collection of feature types is incorporated. For instance, the system proposed by Weng in 1988 combines intensity, edges, and corners to form multiple attributes for matching; Lim and Binford (1987), on the other hand, used a hierarchy of features varying from edges, curves, to surfaces and bodies (2-D regions) for high-level attribute matching.

Types of features

 

Matching constraints

 Stereo matching process is a very difficult search procedure. In order to minimum false matches, some matching constraints must be imposed. Below is a list of the commonly used constraints.

Coarse-to-fine multiresolution matching scheme

 The coarse-to-fine multiresolution matching scheme works as follows:


  
Figure 8: By applying to an image Laplacian of Gaussian filters of different $\sigma$ values followed by a detection of zero-crossings, a hierarchy of edge map from coarse to fine levels can be obtained.
\begin{figure}
\begin{center}
\
\begin{minipage}
{5cm}
(a) original image \\ 
\p...
 ...{figure=lapgauss/cameraman6.ps,width=5cm}
\end{minipage}\end{center}\end{figure}

An image pyramid of edges can be obtained by convolving the images with a Laplacian of Gaussian filter of different widths ($\sigma$)followed by a detection of zero-crossings (Fig. 8). An image pyramid of grey level images can be obtained by applying a smoothing operation followed by sub-sampling (i.e. reducing the image resolution by a factor of 2). Such image pyramid is sometimes referred to as the processing cone (Fig. 9). The SRI STEREOSYS system described later used this smooth and sub-sample scheme.


  
Figure 9: Processing cone.
\begin{figure}
\begin{center}
\

\psfig {figure=proc-cone.xfigps,height=7.5cm}
\end{center}\end{figure}


next up previous
Next: Some existing matching algorithms Up: Computer Vision IT412 Previous: Problem definition
Robyn Owens
10/29/1997