Next:ImplementationUp:Kernel Principal Component AnalysisPrevious:Kernel Principal Component Analysis

Introduction

Kernel Principal Component Analysis (Kernel PCA) [5,6] is a method of non-linear feature extraction, closely related to methods applied in Support Vector Machines (SVMs)[3].
Suppose we have an input data set $\{\vec{x}_{i} \in \mathbb{R}^{n} \: : i=1 \: \mbox{to} \: N\}$ , where the distribution of the data is non-linear. One way to deal with such a distribution is to attempt to linearise it by non-linearly mapping the data from the input space $\vec{x} \in \mathbb{R}^{n}$ to a new feature space, $\Phi(\vec{x}) \in \mathcal{F}$ . Kernel PCA and SVMs both use such a mapping. The mapping $\Phi$ is defined implicitly, by specifying the form of the dot product in the feature space. So, for an arbitrary pair of mapped data points, the dot product is defined in terms of some kernel function thus: $\begin{displaymath}\Phi(\vec{x})\bullet\Phi(\vec{y}) \equiv \mathcal{K}(\vec{x},\vec{y}).\end{displaymath}$

Some commonly used kernels are:

Gaussian: $\mathcal{K}(\vec{x},\vec{y}) = \exp(-\frac{\Vert\vec{x}-\vec{y}\Vert}{2\sigma^2})$

Dot Product Kernels, for example:

Sigmoid: $\mathcal{K}(\vec{x},\vec{y}) = \tanh(\kappa(\vec{x}\cdot\vec{y}) + \theta)$
Polynomial: $\mathcal{K}(\vec{x},\vec{y}) = (\vec{x}\cdot\vec{y})^{d}$

The key point to note is that any algorithm which can be expressed solely in terms of dot products can then be implemented in the Feature Space $\mathcal{F}$ . Applying a hyperplane classifier or linear function fitting leads to the usual SVM classifiers and SVM Regression respectively. Applying linear PCA in the feature space leads to Kernel PCA in the input space.

Next:ImplementationUp:Kernel Principal Component AnalysisPrevious:Kernel Principal Component Analysis

Carole Twining

2001-10-02