A Description of the
Scene Labelling Problem

Figure 1, below, illustrates schematically a room interior (from Ballard &Brown, 1982). The set of objects are denoted by regional boundaries and the set of possible labels is {Door(D), Wall(W), Ceiling(C), Floor(F), Bin(B) }.



Figure 1: Labelling the regions of a room consistently

A set of informal constraints might be,

Unary Constraints

N-ary constraints

Inititially we start with the complete set of possible labels for all the regions (2D), or surfaces (3D), in the image. Applying the unary constraints leads to a reduction in the number of possible labellings, but there is still ambiguity in at least 4 cases. These can be resolved by the N-ary constraints.

In the above example of a discrete labelling regions can only be labelled in a definite manner, either a region is or is not a door. Scene labellings as a whole may be consistent or inconsistent. If a scene cannot be consistently labelled then we might define that scene as impossible, as for example in the case of impossible objects which are represented as line drawings in the AI textbooks. That is all very well, but in an imperfect world it is somewhat unrealistic to ask for certainty in a labelled scene. This leads to the idea of a probabilistic labelling, in which scene elements may be labelled with probability in a continuum (0.0 to 1.0); for example the set of probabilities for a given region in Figure 1, above, might be { Door=0.1, Bin=0.3, Floor=0.1, Wall=0.2, Ceiling=0.3 }. An algorithm for labelling the scene consistently should update these probabilities to obtain an optimum global labelling satisfying as far as possible the unary characteristics of the regions, and the N-ary characteristics of their relationships. We shall consider this further with the specific examples.xs of subsequent sections.

Before considering a number of approaches to object recognition by scene labelling, we introduce a few basic definitions. In general, the set of object models is defined,

Each of the n object models may be represented by a set of m(i) primitives, where the parameter, i, denotes the model number (1<=i<=n). These primitives might be surfaces within a B-Rep model, space curves within a wire frame or B-Rep model, volumetric components in a CSG structure and so on.

Similarly, the segmented scene consists of a set of m(s) primitives, arising from segmentation of 3D range data or 2D intensity data for example, such that

The problem is to find the optimum injection mapping, , from the set of scene features defined by Equation 3 to the set of model features defined by Equation 2 so that the defined metric for the mapping is a maximum,


[ Contents: Matching Models to Scene Descriptions | Interpretation Tree Search ]

Comments to: Sarah Price at ICBL.