Interactive Vision: Medium Level

A stereo
vision system

There are three distinct stages to the recovery of depth by stereo matching, i.e.

(a): Detecting and recording image features in both the images
(b): Matching corresponding image features across both images
(c): Measurement of disparity for each successful match in order to make depth calculations.

The camera geometry can affect greatly the amount of processing required at the feature matching stage. In figure 2 there is a binocular stereo camera set up with both cameras aligned in parallel having the same fixation point F at infinity. Both image planes have the same focal length f, and share the same baseline. This means both cameras effectively share the same image plane (where the illustrated left and right image planes are sub-areas).

Taking any point in the scene, then it will project to two image points on the shared plane. The line that links these two points runs parallel to the baseline and is called an epipolar line. Using such a camera geometry means that all matching image points will exist on, and only on, the same epipolar line (hence the epipolar constraint). From the practical point of view, epipoles correspond to raster lines in the image arrays, meaning only one raster line in each image array needs to be searched for matching image points, greatly reducing the search time. Parallel camera set-ups ensure relative depths of matched features are inversely proportional to their disparities. The actual depths are then calculated, knowing the distance between the cameras and their characteristics, by triangulation. The parallel camera geometry allows corresponding raster lines to be scanned for potential matches.

Another point that ought to be emphasised concerns the amount of camera separation which in turn determines the maximum amount of disparity between images. A good many systems are tested under psychophysical conditions (for comparisons with our own visual processing powers). This means the interocular separation of the cameras is often equivalent to our eyes resulting in minor shifting and very little figural difference between images, which in turn simplifies the detection and matching stages producing very promising results. There is a trade-off between small camera separation which makes the correspondence problem easier, but impairs the resolution and hence depth accuracy, and large camera separation which allows accurate depth reconstruction but makes feature correspondence difficult, since the same features are widely separated in the two images.

Edges are detected by an operator such as the Canny or Sobel operators, which gives information about edge position, strength and direction. On receiving edge positions, strengths, and orientations from the detection stage, the Matcher must attempt to pair up corresponding edges across both left and right images. Matching in random dot stereograms may be harder than in natural images, mainly because all primitives (dots) are the same shape, strength and have the same orientation in both images. In natural images, the additional constraints such as edge orientation can be used to eliminate false matches. However, image matchers are generally faced with a larger number of potential matches. This increases the tendency to mis-match and reduces overall performance.

[ Panum's fusional area | Stage One - Find Potential Matches ]

Comments to: Sarah Price at ICBL.