Bob Fisher
Principal component analysis can be used to analyze the structure of a data set or allow the representation of the data in a lower dimensional dataset (as well as many other applications).
Let be a set of N column vectors of dimension D.
Define the scatter matrix
of the data set as
where is the mean of the dataset
The d largest principle components are the eigenvectors
corresponding to the d largest eigenvalues.
d can be chosen arbitrarily with d < D.
The eigenvectors of S can usually be found by using
singular value decomposition.
The dominant eigenvectors describe the main directions of variation of the data. For example, if a dataset had 2 large eigenvalues, then the data variation is described largely by linear combinations of the 2 corresponding eigenvectors (ie. the data is largely coplanar).
The d eigenvectors can also be used to project the data into a
d dimensional space.
Define
The projection of vector is
.
The corresponding scatter matrix
of the
vectors
is:
The matrix W maximizes the determinant of
for a given d.