One approach to obtaining a 3D of a scene is to use 2 cameras in a binocular system, somewhat similar to that used by the human visual system. These lectures look at the geometry and features used for stereo matching. We look at both edge features, which introduces the Canny edge detector, and point features, which introduces the SIFT features. We use the RANSAC algorithm to find straight 2D lines, match them using a set of stereo correspondence constraints, and then use epipolar geometry to compute the 3D position of the lines. Another set of least-square algorithms estimate the pose. Finally, we introduce one of the early approaches to computing a dense depth map by stereo matching of intensity values.
We introduce the core idea of recovering 3D information from a pair of images slightly displaced. Given a matched pair of points or other structure, computing the 3D positions is simple geometry. So this lecture set focusses on what features to match. This video introduces the 4 main types of feature (patch, point, edge, structure) and the overall summary of the order of topics needed to build the stereo-based object recognition system.
Using points for matching requires points that lie on the same image structure in the 2 images. SIFT points are commonly used because they are invariant to translation, rotation, and scale. The lecture give the theory behind the SIFT points: how they are defined, how they are located and the 128 dimensional vector that describes the neighbourhood around the point.
This short video presents 2 examples of SIFT feature detections, one on a stereo pair, and the other on a translated, rotated, and scaled image.
The matching of points from 2 images depends on the relation between scene and image points, and on the relation between the corresponding points in the 2 images. This section introduces the basics of the pinhole camera, projection and epipolar geometry, including the Fundamental matrix and its estimation.
Many historical stereo algorithms are based on edge fragment correspondence, and many other image description and matching algorithms also use edges. This lecture gives some ideas of what edges are and how one of the best traditional edge detectors (Canny) works. It also introduces 2D convolution and spot noise removal methods.
RANSAC is a general model-based shape matcher, which we use here to find ling straight lines, which are used in the stereo matching. RANSAC can be used for other shapes, such as circles and even arbitrary parameterised shapes. It is particularly useful for finding shpaes in a lot of clutter, and has a tunable failure rate.
To find 3D line segments by triangulation, we need to find matching segments in the left and right images. This talk describes how to use the Fundamental matrix to find overlapping segments between all possible pairs of lines. Pairs that don't have sufficient overlap are ignored, as are pairs that do not have similar contrasts across the edges, and pairs whose disparities are too large or small.
Traditional stereo matching algorithms have used a variety of constraints to limit the possible matches between features in the left and right images. Here we look at several, including the orientation, contrast, shape and epipolar constraints.
Once lines from the left and right images are paired together, we use their geometric relationship to compute the 3D line that was projected onto the two 2D image lines, by intersecting the back-projection of the 2D lines. We test how well this has worked by examining the angles between the reconstructed lines.
Given a set of 3D model lines and 3D scene lines, the Interpretation Tree is used to pair these to hypothesise matches of the model in the image. A 3D version of the 2D pose estimation and verification algorithms is also given. We show the results of the matching algorithm overlaid on the initial image.
The rest of this set computes 3D positions only for the matched features, whether SIFT points, edge fragments or whole lines. But this does not give depth data at every location in the image. Here we look at a pioneering method for computing depth at every point along each rectified scan line independently.