Next: Rectification of camera matrices Up: Epipolar Rectification Previous: Introduction

Subsections

Camera model and epipolar geometry

This section recalls briefly the mathematical background on perspective projections necessary for our purposes. For more details see [4,15].

Camera model

A pinhole camera is modeled by its optical center C and its retinal plane (or image plane) $\cal R$ . A 3-D point W is projected into an image point M given by the intersection of $\cal R$ with the line containing C and W. The line containing C and orthogonal to $\cal R$ is called the optical axis and its intersection with $\cal R$ is the principal point. The distance between C and $\cal R$ is the focal length.

Let ${\bf w} = [x\; y\; z]^\top$ be the coordinates of W in the world reference frame (fixed arbitrarily) and ${\bf m}= [u \; v]^\top$ the coordinates of M in the image plane (pixels). The mapping from 3-D coordinates to 2-D coordinates is the perspective projection, which is represented by a linear transformation in homogeneous coordinates. Let $\tilde {\bf m} = [u\; v\; 1 ]^\top$ and $\tilde{\bf w} = [ x\; y \; z\; 1] ^\top$ be the homogeneous coordinates of M and W respectively; then the perspective transformation is given by the matrix $\tilde {\bf P}$ :

$\begin{displaymath}\tilde {\bf m} \simeq\tilde {\bf P} \tilde {\bf w}, \end{displaymath}$

(1)

where $\simeq$ means equal up to an arbitrary scale factor. The camera is therefore modeled by its perspective projection matrix (henceforth PPM) $\tilde {\bf P}$ , which can be decomposed, using the QR factorization, into the product

$\begin{displaymath}\tilde {\bf P}= {\bf A} [{\bf R}\;\vert\;{\bf t}] . \end{displaymath}$

(2)

The matrix ${\bf A}$ depends on the intrinsic parameters only, and has the following form:

$\begin{displaymath}{\bf A} = \left [ \begin{array}{c c c } \alpha_u & \gamma & u... ... 0 & \alpha_v & v_0 \\ 0 & 0 & 1 \\ \end{array}\right ] , \end{displaymath}$

(3)

where $\alpha_u = -fk_u$ , $\alpha_v = -fk_v$ are the focal lengths in horizontal and vertical pixels, respectively (f is the focal length in millimeters, k_u and k_v are the effective number of pixels per millimeter along the u and v axes), (u₀, v₀) are the coordinates of the principal point, given by the intersection of the optical axis with the retinal plane, and $\gamma$ is the skew factor that models non-orthogonal u-v axes..

The camera position and orientation (extrinsic parameters), are encoded by the $3\times3$ rotation matrix ${\bf R}$ and the translation vector ${\bf t}$ , representing the rigid transformation that brings the camera reference frame onto the world reference frame.

Let us write the PPM as

$\begin{displaymath}\tilde{\bf P}= \left[ \begin{array}{c\vert c} {\bf q}_1^{\top... ...3 4} \\ \end{array}\right ] = [{\bf Q} \vert \tilde {\bf q}]. \end{displaymath}$

(4)

In Cartesian coordinates, the projection (1) writes

$\begin{displaymath}\left \{ \begin{aligned}[l] u &= \dfrac{{\bf q}_1^{\top}{\bf ... ...{2 4}}{{\bf q}_3^{\top}{\bf w}+q_{3 4}}. \end{aligned}\right . \end{displaymath}$

(5)

The focal plane is the plane parallel to the retinal plane that contains the optical center C. The coordinates ${\bf c}$ of C are given by

$\begin{displaymath}{\bf c} = -{\bf Q}^{-1} \tilde{\bf q} . \end{displaymath}$

(6)

Therefore $\tilde {\bf P}$ can be written:

$\begin{displaymath}\tilde{\bf P} = [{\bf Q} \vert - {\bf Q}{\bf c}]. \end{displaymath}$

(7)

The optical ray associated to an image point M is the line M C, i.e. the set of 3-D points $\{ {\bf w}: \tilde{\bf m} \simeq \tilde {\bf P} \tilde{\bf w} \}$ . Its parametric equation in Cartesian coordinates writes:

$\begin{displaymath}{\bf w} = {\bf c}+ \lambda {\bf Q}^{-1}\tilde{\bf m}, \;\;\;\; \lambda \in \mathbb{R} . \end{displaymath}$

(8)

Epipolar geometry

Let us consider a stereo rig composed by two pinhole cameras (Fig. 1). Let ${ \sf C_1}$ and ${\sf C_2}$ be the optical centers of the left and right cameras respectively. A 3-D point ${\sf W}$ is projected onto both image planes, to points ${\sf M_1}$ and ${\sf M_2}$ , which constitute a conjugate pair. Given a point ${\sf M_1}$ in the left image plane, its conjugate point in the right image is constrained to lie on a line called the epipolar line (of ${\sf M_1}$ ). Since ${\sf M_1}$ may be the projection of an arbitrary point on its optical ray, the epipolar line is the projection through ${\sf C_2}$ of the optical ray of ${\sf M_1}$ . All the epipolar lines in one image plane pass through a common point ( ${\sf E_1}$ and ${\sf E_2}$ respectively) called the epipole, which is the projection of the optical center of the other camera.

**Figure 1:** Epipolar geometry.
$\begin{figure} \centerline{ \psfig{file=figures/epipo.eps,width=0.45\linewidth}}\end{figure}$

When ${ \sf C_1}$ is in the focal plane of the right camera, the right epipole is at infinity, and the epipolar lines form a bundle of parallel lines in the right image. A very special case is when both epipoles are at infinity, that happens when the line ${\sf C_1 C_2}$ (the baseline) is contained in both focal planes, i.e., the retinal planes are parallel to the baseline. Epipolar lines, then, form a bundle of parallel lines in both images. Any pair of images can be transformed so that epipolar lines are parallel and horizontal in each image. This procedure is called rectification.

Next: Rectification of camera matrices Up: Epipolar Rectification Previous: Introduction

Andrea Fusiello
2000-03-17