Kernel Discriminant Analysis
Yongmin Li,
Shaogang
Gong and Heather Liddell
Department of Computer Science
Queen Mary, University of London
1. Introduction
For most pattern recognition problems, selecting
an appropriate representation to extract the most significant features
is crucially important. Principal Component Analysis (PCA) [8,10] has been widely adopted to
extract abstract features and to reduce dimensionality in many pattern
recognition problems. But the features extracted by PCA are actually
"global" features for all pattern classes, thus they are not
necessarily much representative for discriminating one class from
others. Linear Discriminant Analysis (LDA) [2,3,9], which seeks to find a linear
transformation by maximising the betweenclass variance and minimising
the withinclass variance, has proved to be a suitable technique for
discriminating different pattern classes. However, both the PCA and
LDA are linear techniques which may be less efficient when severe
nonlinearity is involved. To extract the nonlinear principal
components, Kernel PCA (KPCA) [7] was developed using the
popular kernel technique [11,12]. However, similar to the linear
PCA, KPCA captures the overall variance of all patterns which are not
necessary significant for discriminant purpose. To extract the
nonlinear discriminant features, Kernel Discriminant Analysis (KDA)
[6,1,4], a nonlinear discriminating method
based on kernel techniques [11,12], was developed.
2. Kernel Discriminant Analysis
The principle of KDA can be
illustrated in Figure 1. Owing
to the severe nonlinearity, it is difficult to directly compute the
discriminating features between the two classes of patterns in the
original input space (left). By defining a nonlinear mapping from the
input space to a highdimensional feature space (right), we (expect
to) obtain a linearly separable distribution in the feature
space. Then LDA, the linear technique, can be performed in the feature
space to extract the most significant discriminating
features. However, the computation may be problematic or even
impossible in the feature space owing to the high dimensionality. By
introducing a kernel function which corresponds to the nonlinear
mapping, all the computation can conveniently be carried out in the
input space. The problem can be finally solved as an
eigendecomposition problem like PCA, LDA and KPCA.

Figure 1: Kernel Discriminant Analysis.
3. A Toy Problem
We use a toy problem to illustrate the
characteristics of KDA as shown in Figure 2. Two classes of patterns
denoted by circles and crosses respectively have a significant
nonlinear distribution. To make the results comparable among
different representations (PCA, LDA, KPCA and KDA), we try to separate
them with a one dimensional feature, i.e. the most significant
mode of PCA, LDA, KPCA or KDA. The first row shows the patterns and
the discriminating boundaries computed by each of the four different
methods. The second row illustrates the intensity of the
onedimensional features given by PCA, LDA, KPCA and KDA on the region
covered by the training patterns.

Figure 2: Solving a nonlinear
classification problem with, from left to right, PCA, LDA, KPCA and
KDA. The first row shows the patterns and the discriminating
boundaries computed by each of the four methods. The second row
illustrates the intensity of the computed onedimensional features on
the region covered by the training patterns.
4. Representing MultiView Faces Using KDA
Face images with large pose variation demonstrate significant
nonlinearity, which makes face recognition a challenging problem. To
address this problem, we apply KDA to represent multiview face
patterns.
In the following example (Figure 3), 540 multiview
face patterns from 12 subjects, 45 of each, were used to train
KDA. For the sake of clarity, only the patterns of first four face
classes are shown here. To compare the performance of KDA, we also
applied PCA, LDA and KPCA to the same set of face patterns.

Figure 3: Distribution of multiview
face patterns in PCA, LDA, KPCA and KDA spaces (only the first two
dimensions).
It is noted that

the pattern distributions using PCA and KPCA are not satisfactorily separable,
or more precisely, the variation from different subjects is even overshadowed
by that of other sources, since these two techniques are not designed for
discriminating purpose;

LDA performs better than PCA and KPCA, but not as good as KDA;

KDA provides the best discriminating performance among the four
methods.
Yongmin Li 20011014