Camera model and epipolar geometry

This section recalls briefly the mathematical background on perspective projections necessary for our purposes. For more details see [4,15].

Camera model

A pinhole camera is modeled by its optical center C and its retinal plane (or image plane) $\cal R$. A 3-D point W is projected into an image point M given by the intersection of $\cal R$ with the line containing C and W. The line containing C and orthogonal to $\cal R$ is called the optical axis and its intersection with $\cal R$ is the principal point. The distance between C and $\cal R$ is the focal length.

Let ${\bf w} = [x\; y\; z]^\top$ be the coordinates of W in the world reference frame (fixed arbitrarily) and ${\bf m}= [u \; v]^\top$the coordinates of M in the image plane (pixels). The mapping from 3-D coordinates to 2-D coordinates is the perspective projection, which is represented by a linear transformation in homogeneous coordinates. Let $\tilde {\bf m} = [u\; v\; 1
]^\top$ and $\tilde{\bf w} = [ x\; y \; z\; 1] ^\top$ be the homogeneous coordinates of M and W respectively; then the perspective transformation is given by the matrix $\tilde {\bf P}$:

 \begin{displaymath}\tilde {\bf m} \simeq\tilde {\bf P} \tilde {\bf w},
\end{displaymath} (1)

where $\simeq$ means equal up to an arbitrary scale factor. The camera is therefore modeled by its perspective projection matrix (henceforth PPM) $\tilde {\bf P}$, which can be decomposed, using the QR factorization, into the product

 \begin{displaymath}\tilde {\bf P}= {\bf A} [{\bf R}\;\vert\;{\bf t}] .
\end{displaymath} (2)

The matrix ${\bf A}$ depends on the intrinsic parameters only, and has the following form:

\begin{displaymath}{\bf A} =
\left [
\begin{array}{c c c }
\alpha_u & \gamma & u...
0 & \alpha_v & v_0 \\
0 & 0 & 1 \\
\end{array}\right ] ,
\end{displaymath} (3)

where $\alpha_u = -fk_u $, $\alpha_v = -fk_v $ are the focal lengths in horizontal and vertical pixels, respectively (f is the focal length in millimeters, ku and kv are the effective number of pixels per millimeter along the u and v axes), (u0, v0) are the coordinates of the principal point, given by the intersection of the optical axis with the retinal plane, and $\gamma$is the skew factor that models non-orthogonal u-v axes..

The camera position and orientation (extrinsic parameters), are encoded by the $3\times3$ rotation matrix ${\bf R}$ and the translation vector ${\bf t}$, representing the rigid transformation that brings the camera reference frame onto the world reference frame.

Let us write the PPM as

\begin{displaymath}\tilde{\bf P}=
\begin{array}{c\vert c}
{\bf q}_1^{\top...
...3 4} \\
\end{array}\right ] = [{\bf Q} \vert \tilde {\bf q}].
\end{displaymath} (4)

In Cartesian coordinates, the projection (1) writes

 \begin{displaymath}\left \{
u &= \dfrac{{\bf q}_1^{\top}{\bf ...
...{2 4}}{{\bf q}_3^{\top}{\bf w}+q_{3 4}}.
\end{aligned}\right .
\end{displaymath} (5)

The focal plane is the plane parallel to the retinal plane that contains the optical center C. The coordinates ${\bf c}$ of C are given by

 \begin{displaymath}{\bf c} = -{\bf Q}^{-1} \tilde{\bf q} .
\end{displaymath} (6)

Therefore $\tilde {\bf P}$ can be written:

 \begin{displaymath}\tilde{\bf P} = [{\bf Q} \vert - {\bf Q}{\bf c}].
\end{displaymath} (7)

The optical ray associated to an image point M is the line M C, i.e. the set of 3-D points $\{ {\bf w}: \tilde{\bf m} \simeq
\tilde {\bf P} \tilde{\bf w} \}$. Its parametric equation in Cartesian coordinates writes:

 \begin{displaymath}{\bf w} = {\bf c}+ \lambda {\bf Q}^{-1}\tilde{\bf m}, \;\;\;\;
\lambda \in \mathbb{R} .
\end{displaymath} (8)

Epipolar geometry

Let us consider a stereo rig composed by two pinhole cameras (Fig. 1). Let ${ \sf C_1}$ and ${\sf C_2}$ be the optical centers of the left and right cameras respectively. A 3-D point ${\sf
W}$ is projected onto both image planes, to points ${\sf M_1}$ and ${\sf M_2}$, which constitute a conjugate pair. Given a point ${\sf M_1}$ in the left image plane, its conjugate point in the right image is constrained to lie on a line called the epipolar line (of ${\sf M_1}$). Since ${\sf M_1}$ may be the projection of an arbitrary point on its optical ray, the epipolar line is the projection through ${\sf C_2}$ of the optical ray of ${\sf M_1}$. All the epipolar lines in one image plane pass through a common point ($ {\sf E_1}$ and $
{\sf E_2}$ respectively) called the epipole, which is the projection of the optical center of the other camera.

Figure 1: Epipolar geometry.

When ${ \sf C_1}$ is in the focal plane of the right camera, the right epipole is at infinity, and the epipolar lines form a bundle of parallel lines in the right image. A very special case is when both epipoles are at infinity, that happens when the line ${\sf C_1 C_2}$(the baseline) is contained in both focal planes, i.e., the retinal planes are parallel to the baseline. Epipolar lines, then, form a bundle of parallel lines in both images. Any pair of images can be transformed so that epipolar lines are parallel and horizontal in each image. This procedure is called rectification.

