Next: Optical flow techniques Up: Computer Vision IT412 Previous: Optical flow

Structure from motion

The determination of structure from motion is effectively equivalent to stereo with a single camera. The problem is that first one has to deduce the three dimensional motion of the camera between the time intervals t and $t + \delta t$ . Once this is achieved we can then solve for the three dimensional positions of the matched points using the standard stereo equations.

If both the view and scene geometry are unknown but the scene structure remains rigid between views, then it is possible to deduce the viewing geometry (up to a scale factor) and hence to solve for the scene structure. In such a scenario, all we have is a number of matched points in two images. The viewing geometry is specified by six parameters, namely the translation and rotation parameters of the camera motion.

Each observation of a point in the two images gives us four pieces of information, the row and column pixel coordinates in the two images. However, we also introduce three unknowns for each observation, namely the 3D coordinates of the observed point.

Thus, if n points are observed, we have 6 + 3n unknowns, and 4n observations. However, it is only possible to deduce camera translation up to a scale factor, as is illustrated in figure 5. So we really only have 5 + 3n solvable unknowns. From this we can deduce that we must have at least n = 5 observations of matched points to solve the system. In practice many more points than 5 are used to reduce the influence of noise.

**Figure:** Camera translation can only be solved up to a scale factor. As the image point moves, this can correspond to a point far away and a large camera movement, or a point close by and a small camera movement.
$\begin{figure} \par \centerline{ \psfig {figure=figure5.ps,angle=-90} } \par\end{figure}$

Next: Optical flow techniques Up: Computer Vision IT412 Previous: Optical flow

Robyn Owens
10/29/1997