Face Recognition - Optimal Basis Encoding

Encoding natural scenes takes advantage of intrinsic image statistics and seeks also to derive an universal ('natural') basis (Hancock, Baddeley, and Smith, 1992; Olshausen and Field, 1996). The derived basis functions have been found to closely approximate the receptive fields of simple cells in the mammalian primary visual cortex. The receptive fields resemble various oriented derivative-of-Gaussian (DOG) functions which are spatially localized, oriented and bandpass (Olshausen and Field, 1996). Barlow (1989) argues that such receptive fields might arise from unsupervised learning subject to redundancy reduction or minimum entropy coding. Olshausen and Field (1996) derive localized oriented receptive fields based on a criterion of sparseness, while Bell and Sejnowski (1995) use an independence criterion to derive qualitatively similar results.

The rationale behind a natural basis is that the basis should be complete and allow for the derivation of unique image representations suitable for extracting the intrinsic structure of sensory signals. Such intrinsic structures are essential for processes such as image retrieval and object recognition. Specifically, once the natural basis has been derived no additional training is necessary and both the training and the novel images on future tasks are represented in terms of the already available natural basis. The natural basis, however, also has its drawbacks - it is too general to represent a specific task quite well. As for face recognition the class of objects to be represented is specific - human face images, possibly indexed by gender, ethnicity and age - one should seek the face rather than an `universal' all encompassing natural basis. This observation also fits with knowledge that the "bias/variance dilemma may be circumvented if we are willing to purposely introduce bias, which then makes it possible to eliminate the variance or reduce it significantly" (Haykin, 1994). The bias allows then for the inclusion of the discrimination (scatter) index, a constraint known to reduce the guaranteed risk, in the derivation of the face basis. Learning low dimensional representations of visual objects with extensive use of prior knowledge has also been recently suggested by Edelman and Intrator (1997) who claim that "perceptual tasks such as similarity judgment tend to be performed on a low-dimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for attaining a given level of performance grows exponentially with the dimensionality of the underlying representation space."

Liu and Wechsler (1998) developed Evolutionary Pursuit (EP) (http://chagall.gmu.edu) as a novel and adaptive dictionary method for image encoding and classification. In analogy to pursuit methods, EP seeks to learn an optimal basis for the dual purpose of data compression and pattern classification. The challenge for EP is to increase the generalization ability of the learning machine as a result of seeking the trade-off between minimizing the empirical risk encountered during training and narrowing the confidence interval for reducing the guaranteed risk during future testing on unseen images. Towards that end EP implements strategies characteristic of genetic algorithms (GAs) for searching the space of possible solutions to determine the optimal basis. EP starts by projecting the original data into a lower dimensional whitened Principal Component Analysis (PCA) subspace. Directed but random rotations of the basis vectors in this subspace are then searched by GAs where evolution is driven by a fitness function defined in terms of performance accuracy (`empirical risk') and class separation (`confidence interval'). Accuracy indicates the extent to which learning has been successful so far, while separation gives an indication of the expected fitness on future trials. The feasibility of the new EP method has been successfully tested on face recognition where the large number of possible basis functions requires some type of greedy search algorithm. The particular face recognition task involves 1,107 FERET frontal face images corresponding to 369 subjects. To assess both the accuracy and generalization capability of EP, the facial data includes for each subject images acquired at different times or under different illumination conditions. The experimental results show that EP improves on face recognition performance when compared against PCA (`eigenfaces') ( Turk and Pentland, 1991) and displays better generalization abilities than Fisher linear discriminant (`Fisherfaces') (Belhumeur, Hespanha and Kriegman, 1997).