Dynamic Face Recognition Using Identity Surfaces

Yongmin Li, Shaogang Gong and Heather Liddell
Department of Computer Science
Queen Mary, University of London

1. Introduction

Recognising faces with large pose variation is a challenging problem owing to the severe nonlinearity caused by rotation in depth, self-shading and self-occlusion. The traditional techniques like probability estimation or template matching usually cannot solve this problem satisfactorily. However, it can be solved more efficiently if the pose information is explicitly used. Based on this idea, we developed a method of multi-view face recognition using identity surfaces. The basic idea of the identity surfaces is similar to the parametric eigenspace method presented by Murase and Nayar [4,5].
 

2. Identity Surfaces

An identity surface of a face class (all face patterns belonging to one subject) is referred to as a unique hyper surface in a feature space. If we only focus on the variation of facial appearance from pose change, the feature space can be constructed as a pose parameterised feature space. As shown in Figure 1, the two basis coordinates stand for the head pose: tilt and yaw, and the other coordinates are used to represent the discriminating features. For each pair of tilt and yaw, there is one unique "point" for a face class. The distribution of all these "points" of a same face class forms a hyper surface in this feature space. We call this surface an identity surface.
Figure 1: Identity surfaces for dynamic face recognition.

Using identity surfaces, face recognition can be performed dynamically rather than statically from a video input. As shown in Figure 1, when a face is detected and tracked in an input video stream, one obtains the object trajectory of the face in the feature space. Also, its projection onto each of the identity surfaces with the same pose information and temporal order forms a model trajectory of the specific face class. It can be regarded as the ideal trajectory of this face class encoded by the same spatio-temporal information (pose information and temporal order from the video sequence) as the tracked face. Then face recognition can be carried out by matching the object trajectory with a set of model trajectories. Compared to face recognition on static images, this approach can be more reliable and accurate. For example, it is difficult to decide whether the pattern X in Figure 1 belongs to subject A or B for a single pattern. However, if we know that X is tracked along the object trajectory (red curve), it is more likely to be subject A than B [3].
 

3. Constructing Identity Surfaces

If sufficient patterns of a face class in different views are available, the identity surface of this face class can be constructed precisely. However, we do not presume such a strict condition. In this work, we develop a method to synthesise the identity surface of a face class from a small sample of face patterns which sparsely cover the view sphere. These face patterns can be conveniently acquired, for example, by recording a small video sequence of a subject to be recognised. The basic idea is to approximate the identity surface using a set of Np planes separated by a number of Nv predefined views. The problem can be finally defined as a quadratic optimisation problem which can be solved using the interior point method.
Figure 2: The identity surface synthesised from 15 prototype patterns. Only its projections in the first three dimensions of the discriminating feature are shown here. The two basis axes are pose in tilt and yaw, and the vertical axis is the discriminating feature.

Figure 2 shows the synthesised identity surface from only 15 views. A ten-dimensional discriminating feature vector is used in this example. For clarity, only the first three dimensions is shown here.

4. Dynamic Face Recognition

We demonstrate the performance of this approach on a small scale multi-view face recognition problem. Twelve sequences, each from a set of 12 subjects, were used as training sequences to construct the identity surfaces. The number of frames contained in each sequence varies from 40 to 140. A multi-view face model [2] was fitted on these sequences to obtain the normalised facial texture patterns. Kernel Discriminant Analysis (KDA) [1] was adopted to extract the discriminating features. The identity surfaces were then constructed from these KDA features and their corresponding pose.

Then recognition was performed on new test sequences of these subjects. Also, the same face model and KDA were applied. Figure 3 shows the sample images fitted by the multi-view face model and the normalised facial texture patterns from a test sequence.

Figure 3: Sample frames, fitted 3D shape patterns, and the normalised facial texture patterns from a test sequence.

Figure 4 shows the estimated pose in tilt and yaw.

Figure 4: Pose in tilt (blue) and yaw (red).

Figure 5 shows the object and model trajectories in the first two dimension of the discriminating feature.

Figure 5: The object trajectory (red) and model trajectories in the first KDA dimension (solid line for the one from the ground-truth subject).

Figure 6 demonstrates the performance of dynamic face recognition using the trajectory distance with comparison to that of static face recognition using pattern distance on individual frames. The pattern distance is computed between an object pattern and its corresponding point on each of the identity surfaces, while the trajectory distance is the summation of the pattern distance over time. It is noted that the trajectory distance provides a more reliable performance, especially its accumulated effects over time.

Figure 6: Recognition results using trajectory distance (right) with comparison to those using pattern distance from individual frames (left).




Yongmin Li 2001-10-14