Firstly, a word on notation. Points, as entities in their own right,
will be denoted in italics. When such points are expressed in
Euclidean coordinates, we will use bold notation, and when they are
expressed in projective coordinates, they will be bold with a tilde.
Thus a point M in three space might be imaged at m, and m might
have coordinates or
.Of course, image points can also be expressed in the camera
coordinate system. When we do this we will write m as
or
.
Moreover, as with the last lecture, much of the development in this lecture is done in the setting of projective geometry, which was first introduced in Lecture 1. There is one result which we will use constantly, so it is important to have it clearly understood.
Result: A line going through two points, and
is represented by the cross product
.
Proof. A point on the line is given by , for arbitrary values of the
scalars
and
. This is equivalent to writing that
the determinant
.But this determinant can also be written as
In the last lecture, we considered in detail the geometry of a single camera. We will now introduce a second view and study the geometric properties of the set of two views. The main new geometric property is known in computer vision as the epipolar constraint.
There are two ways of extracting three-dimensional structure from a pair of images. In the first, and classical method, known as the calibrated route, we firstly need to calibrate both cameras (or viewpoints) with respect to some world coordinate system, calculate the so-called epipolar geometry by extracting the essential matrix of the system, and from this compute the three-dimensional Euclidean structure of the imaged scene.
However it is the second, or uncalibrated route, that more likely corresponds to the way in which biological systems determine three-dimensional structure from vision. In an uncalibrated system, a quantity known as the fundamental matrix is calculated from image correspondences, and this is then used to determine the projective three-dimensional structure of the imaged scene.
In both approaches the underlying principle of binocular vision is that of triangulation. Given a single image, the three-dimensional location of any visible object point must lie on the straight line that passes through the centre of projection and the image of the object point (see figure 1). The determination of the intersection of two such lines generated from two independent images is called triangulation.
Clearly, the determination of the scene position of an object point through triangulation depends upon matching the image location of the object point in one image to the location of the same object point in the other image. The process of establishing such matches between points in a pair of images is called correspondence, and will be dealt with at length in the next lecture.
At first it might seem that correspondence requires a search through the whole image, but the epipolar constraint reduces this search to a single line. To see this, we consider figure 2.
The epipole is the point of intersection of the line joining the optical centres, that is the baseline, with the image plane. Thus the epipole is the image, in one camera, of the optical centre of the other camera.
The epipolar plane is the plane defined by a 3D point M and the optical centres C and C'.
The epipolar line is the straight line of intersection of the epipolar plane with the image plane. It is the image in one camera of a ray through the optical centre and image point in the other camera. All epipolar lines intersect at the epipole.
Thus, a point x in one image generates a line in the
other on which its corresponding point must lie. We see that the
search for correspondences is thus reduced from a region to a line.
This is illustrated in figure 3.
To calculate depth information from a pair of images we need to compute the epipolar geometry. In the calibrated environment we capture this geometric constraint in an algebraic representation known as the essential matrix. In the uncalibrated environment, it is captured in the fundamental matrix.
With two views, the two camera coordinate systems are related by a
rotation and a translation
(see figure 4):
![]() |
(1) |
![]() |
(2) |
Notice that equation (1) is homogeneous with respect to . This
reflects the fact that scale is undetermined and we cannot recover the
absolute scale of the scene without some extra information, such as
knowing the distance in space between two points. Thus
is a
matrix but it only depends on five parameters. Note also
that the case
is a trivial solution but one from which we
are unable to calculate any information about the depth of points in
space; for this reason, it is usually excluded.
In the uncalibrated case, we don't know and
; all we
have are image coordinates in the image plane. Nevertheless, the correspondence
between an image point m and its epipolar line lm is very
simple; in fact it is linear in projective coordinates, as we can see
by the following analysis.
Suppose we have two views of a point M in three dimensional
space, with M imaged at m in view 1 and m' in view 2. From
the last lecture we know that there are projective matrices
and
such that in projective coordinates
![]() |
(3) |
Now, image points and rays in Euclidean 3-space are related by
The essential and the fundamental matrices have the following properties: