Prev: Abstract | Up: ^ Table of Contents ^ | Next: Coherence-based Stereo |
IntroductionThe structure of 3D-space is hidden in the small relative displacements between the left and right view of a scene (disparities). Beside the small displacements of image parts, numerous other image variations occur between the two stereo views, making the reliable computation of disparities a nontrivial problem. Sensor noise and different transfer functions of the left and right imaging system introduce stochastic signal variations, whereas the varying view geometry leads to a variety of systematic image variations, including occlusion effects and foreshortening. In addition, since most object surfaces in real-world scenes display specular reflection, the intensities observed by the imaging systems are not directly correlated with the object surfaces, but nearly always have a viewpoint dependent component which moves independently of the surface in question. Classical approaches to stereo vision try to counteract this whole set of distorting signal variations with two basic algorithmic paradigms, known as feature- and as area-based approaches [1]. In feature-based stereo algorithms, the intensity data is first converted to a set of features assumed to be a more stable image property than the raw intensities. The matching stage operates only on these extracted image features. In contrast, area-based approaches compare directly intensity values within small image patches of the left and right view, and try to maximize the correlation between these patches. To assure stable performance, area-based stereo algorithms need suitably chosen correlation measures and a sufficiently large patch size. Both types of stereo algorithms create computational problems which can be directly attributed to the basic assumptions inherent in these approaches. For example, only a few specific feature-classes are generally utilized in feature-based algorithms. Therefore most areas of the stereo images end up in the ``no feature present''-class, which is not considered further in the matching process. This leads to a tremendous data reduction which speeds up processing, but makes it impossible to calculate disparity estimates for most of the image regions. In order to obtain dense disparity maps, one is forced to interpolate these missing values. To complicate matters further, every feature detected in the left image can potentially be matched with every feature of the same class in the right image. This is the classical ``false matches''-problem, which is basic to all feature-based stereo algorithms. The problem can only be solved by the introduction of additional constraints to the final solution of the matching problem. These constraints are usually derived from reasonable assumptions about the physical properties of object surfaces, and rule out certain combination of matches. Classical constraints include the uniqueness of a match, figural continuity and the preserved ordering of matches along horizontal scanlines [2]. In conjunction with the features extracted from the images, constraints define a complicated error measure which can be minimized by direct search techniques or through cooperative processes. The problems inherent to feature-based stereo algorithms can simply be reduced by increasing the number of features classes considered in the matching process. In the extreme case, one might utilize a continuum of feature-classes. For example, the locally computed Fourier phase can be used for classifying local intensity variations into feature-classes indexed by the continuous phase value [3]. Using such a continuum of feature-classes, with the feature index derived from some small image area, is very similar to standard area-based stereo algorithms. In these algorithms, not only a single feature value, but the full vector of image intensities over a small image patch is used for matching. Classical area-based approaches minimize the deviation or maximize the correlation between patches of the left and right view. A large enough patchsize assures a stable performance (via the central-limit theorem). The computation of the correlation measure and the subsequent maximization of this value turns out to be a computationally expensive process, since extensive search is required in configuration space. This problem is usually solved within hierarchical approaches, where disparity data obtained at coarse spatial scales is used to restrict searching at finer scales. However, for generic image data there is no guarantee that the disparity information obtained at the coarse scales is valid. The disparity estimate might be wrong, might have a different value than at finer scales, or might not be present at all. Thus hierarchical approaches will fail under various circumstances. A third way for calculating disparities is known as phase-based methods [4, 5]. These approaches derive Fourier-phase images from the raw intensity data. Extraction of the Fourier phase can be considered as a local contrast equalization reducing the effects of many intensity variations between the two stereo views. Algorithmically, these approaches are in fact gradient-based optical flow methods, with the time-derivative approximated by the difference between the left and right Fourier-phase images [6]. The Fourier phase exhibits wrap-around, making it again necessary to employ hierarchical methods -- with the already discussed drawbacks. Furthermore, additional steps have to be taken to ensure the exclusion of regions with ill-defined Fourier phase [7]. The new approach to stereo vision presented in this paper rests on simple disparity estimators which also employ classical optical flow estimation techniques. However, essential to the new approach are aliasing effects of these disparity units in connection with a simple coherence detection scheme. The new algorithm is able to calculate dense disparity maps, verification counts and the cyclopean view of the scene within a single computational structure.
Comments are welcome! © 1997 by Rolf Henkel - all rights reserved. |