An overview of shape-from-motion
Helmut Cantzler

In this section we are interested in extracting the shape of a scene from the spatial and temporal changes occurring in an image sequence. This technique exploits the relative motion between camera and scene. Similar to the stereo technique the process can be divided into the subprocesses finding of correspondence from consecutive frames and reconstruction of the scene. Although, there are some important differences. The differences between consecutive frames are, on average, much smaller than those of typical stereo pairs, because image sequences are sampled at high rates. Unlike stereo, in motion the relative 3D displacement between the viewing camera and the scene is not necessarily caused by a single 3D transformation.

Regarding correspondence, the fact that motion sequences provide many closely sampled frames for analysis is an advantage. Firstly, tracking techniques, which exploit the past history of the motion to predict disparities in the next frame can be used. Secondly, the correspondence problem can also be cast as the problem of estimating the apparent motion of the image brightness pattern (optical flow). Two kinds of methods are commonly used to compute the correspondence. Differential methods use estimates of time derivatives and require therefore image sequences sampled closely. This method is computed at each image pixel and leads to dense measurements. Matching methods use Kalman filtering to match and track efficiently sparse image features over time. This method is computed only at a subset of image points and produces sparse measurements.

Unlike correspondence, reconstruction is more difficult in motion than in stereo. Frame-by-frame recovery of motion and structure turns out to be more sensitive to noise. The reason is that the baseline between consecutive frames is very small. For reconstruction we can use the motion field of the image sequence. The motion field is the projection of the 3D velocity field on the image plane. One way to acquire the 3D data is to determine the direction of translation through approximate motion parallax. Afterwards, we can determine a least-squares approximation of the rotational component of the optical flow and use it in the motion field equations to compute depth.