next up previous
Next: References Up: Computer Vision IT412 Previous: Computer Vision Representations


Object recognition

The problem in object recognition is to determine which, if any, of a given set of objects appear in a given image or image sequence. Thus object recognition is a problem of matching models from a database with representations of those models extracted from the image luminance data. Early work involved the extraction of three-dimensional models from stereo data, but more recent work has concentrated on recognizing objects from geometric invariants extracted from the two-dimensional luminance data.

Of course, the representation of the object model is extremely important. Clearly, it is impossible to keep a database that has examples of every view of an object under every possible lighting condition. Thus, object views will be subject to certain transformations; certainly perspective transformations depending on the viewpoint, but also transformations related to the lighting conditions and other possible factors.

Two approaches have been developed to deal with the many possible transformations that an object may undergo in the imaging process: firstly, determine the transformation in question and then try and undo its effects, and secondly, find measurements of the object that are invariant to these types of transformations.

There are two stages to any recognition system. The first is the acquisition stage, where a model library is constructed from certain descriptions of the objects. The second is recognition, where the system is presented with a perspective image and determines the location and identity of any library objects in the image.

Generally, the most reliable type of object information that is available from an image is geometric information. So object recognition systems draw upon a library of geometric models, containing information about the shape of known objects. Usually, recognition is considered successful if the geometric configuration of an object can be explained as a perspective projection of a geometric model of the object.

Model-based recognition

All object recognition systems contain the following modules to some extent:

Thus object recognition is a process of hypothesizing an object-to-model correspondence and then verifying that the hypothesis is correct. Generally an hypothesis is considered successful if the error between the projected model features and the corresponding image features is below some threshold, and a reasonable fraction of the object outline is covered by the image features.

For the two approaches mentioned above, that of estimating the transformation undergone in the imaging process has complexity $O(\lambda i^km^k)$, where $\lambda$ is the number of models, i is the number of image features, m is the number of features per model, and k is the number of features needed to determine the object-image transformation. Typically, k is about 4.

The approach that uses transformation-invariant measurements of the object in the image for recognition has complexity O(ik), where k is the number of features required to form the indexing. In this case, recognition need not be proportional to the number of models in the library. This can be a considerable advantage when the number of models is large.

Geometric invariants

An invariant of a geometric configuration is a function of the configuration whose value is unchanged by a particular transformation. For example, the distance between two points is unchanged for a Euclidean transformation (translation or rotation).

There are a number of geometric invariants for perspective transformations. Here we will illustrate just one of them, the cross-ratio of four points on a line.

Suppose we are given a configuration of four points on a line, as shown in Figure 6.

Figure: A one-dimensional construction of perspective viewing. The optical centre of the camera is O. Under perspective projection, the length, and ratios of lengths, on a line are not invariant, but ratios of ratios are.
\special{hoffset = 72
 psfile =}\end{figure}

The ratio of ratios of lengths on the line, called the cross-ratio, is given by

where X1', X2', X3', and X4' represent the corresponding positions of each point along the line.

The perspective transformation between the lines X and X' is given by

Now to see why the cross-ratio of four points on a line is preserved under such a transformation we note that the distance (Xi' - Xj') can be written as a determinant:

Xi' - Xj' = \vert S(Xi', Xj')\vert
= \left\vert \begin{array}
 Xi' & Xj' \\  1 & 1
 \end{array} \right\vert.\end{displaymath}

Under the projective transformation above, the matrix S(Xi', Xj') transforms as follows:

\left( \begin{array}
 Xi' & Xj' \\  1 & 1
 \end{array} ...{array}
 k_iXi & k_jXj \\  k_i & k_j

and taking the determinant of both sides gives

|S(Xi', Xj')| = kikj|M|.|S(Xi, Xj)|.

Substituting this relation into the expression for the cross-ratio gives

In summary, the cross-ratio is an invariant of any sets of four collinear points in projective correspondence. It is unaffected by the relative position of the line or the position of the optical centre, as shown in Figure 7.

Figure: The cross-ratio of every set of four collinear points shown in this figure has the same value.
\special{hoffset = 0
 psfile =}\end{figure}

Recognition using invariants

There are two stages to model-based recognition using invariants:

Model acquisition. Models of objects to be recognised are acquired directly from images. For planar objects, this involves computing their plane projective invariants. Their outline is also stored for the verification process.
Recognition. Invariants are computed for geometric invariants in the target image. If the invariant value corresponds to one in the model library, a recognition hypothesis is generated. This hypothesis is either confirmed or denied by verification: The model outline from the acquisition image is projected onto the target image. If the projected edges overlap image edges sufficiently then the hypothesis is verified.

next up previous
Next: References Up: Computer Vision IT412 Previous: Computer Vision Representations
Robyn Owens