We will now return to image formation and camera geometry in a bit more detail to determine how one calibrates a camera to determine the relationship between what appears on the image (or retinal) plane and where it is located in the 3D world.
Imagine we have a three dimensional coordinate system whose origin is at the centre of projection and whose Z axis is along the optical axis, as shown in figure 1. This coordinate system is called the standard coordinate system of the camera. A point M on an object with coordinates (X,Y,Z) will be imaged at some point m = (x, y) in the image plane. These coordinates are with respect to a coordinate system whose origin is at the intersection of the optical axis and the image plane, and whose x and y axes are parallel to the X and Y axes. The relationship between the two coordinate systems (c,x,y) and (C,X,Y,Z) is given by
![]() |
(1) |
![]() |
(2) |
We can express the transformation from three dimensional world coordinates
to image pixel coordinates using a matrix. This is done by
substituting equation (1) into equation (2) and multiplying through by Z
to obtain
There are five camera parameters, namely the focal length f, the pixel width,
the pixel height, the parameter uc which is the u pixel coordinate
at the optical centre, and the parameter vc which is the v pixel
coordinate at the optical centre. However, only four separable parameters
can be solved for as there is an arbitrary scale factor involved
in f and in the pixel size. Thus we can only solve for the ratios
pixel width and
pixel height. The
parameters
and vc do not depend on the
position and orientation of the camera in space, and are thus called
the intrinsic parameters.
In general, the three dimensional world coordinates of a point will not be specified in a frame whose origin is at the centre of projection and whose Z axis lies along the optical axis. Some other, more convenient frame, will more likely be specified, and then we have to include a change of coordinates from this other frame to the standard coordinate system. Thus we have
The camera matrix P and the
homogeneous
transform K combine to form a single
matrix C,
called the camera calibration matrix. We can write the general
form of C as a function of the intrinsic and extrinsic parameters:
![]() |
(3) |
Consider a translation of -f along the Z axis of the standard coordinate frame, so that the focal plane and the image plane are now coincident. Since there is no rotation involved in this transformation, it is easy to see that the camera calibration matrix is just