There are three coordinate systems involved --- camera, image and world.
This can be written as a linear mapping between homogeneous coordinates (the equation is only up to a scale factor):
where a projection matrix represents a map from 3D to 2D.
is a upper triangular matrix, called the camera calibration matrix:
where , .
Finally, concatenating the three matrices,
which defines the projection matrix from Euclidean 3-space to an image: