An overview of shape-from-stereo
Helmut Cantzler

Stereo vision refers to the ability to acquire information on the 3D structure and distance of a scene from two or more intensity images taken from different viewpoints.

The stereo system determines which point in one image corresponds to which point in another image (Correspondence Problem) [1]. A problem is that some parts of the scene are visible in a subset of the images only. Therefore, a stereo system must also be able to decide the image parts that should not be matched. We can classify correspondence algorithms into correlation-based and feature-based algorithms. In correlation-based algorithms, the elements to match are image windows of fixed size, and the similarity criterion is a measure of the correlation between the windows in the two images. These algorithms typically give dense measurements of depth. On the other hand, feature-based methods use a set of features to find correspondence in two images. The distance between feature descriptors is measured with the numerical and symbolical properties of the features. Corresponding elements are given by the most similar feature pair. The feature-based approaches typically give 3D depth only sparsely at the corresponding feature points.

After the stereo system has found pairs of corresponding image points it can start to do the reconstruction of the scene. The way in which stereo determines the position in space of a pair of image points is triangulation, that is, by measuring the difference in retinal position between the corresponding points in the two images, known as disparity. This method needs the knowledge on the parameters of the stereo system. There are some other methods that can be used if only some or none of the system's parameters are available.

1
J. Banks, M. Bennamoun, K. Kubik, and P. Corke. An accurate and reliable stereo matching algorithm incorporating the rank constraint. Symposium on Intelligent robotic systems, pages 23--32, 1999.