Bob Fisher
Principal component analysis can be used to analyze the structure of a data set or allow the representation of the data in a lower dimensional dataset (as well as many other applications).
Let be a set of N column vectors of dimension D.
Define the scatter matrix of the data set as
where is the mean of the dataset
The d largest principle components are the eigenvectors corresponding to the d largest eigenvalues. d can be chosen arbitrarily with d < D. The eigenvectors of S can usually be found by using singular value decomposition.
The dominant eigenvectors describe the main directions of variation of the data. For example, if a dataset had 2 large eigenvalues, then the data variation is described largely by linear combinations of the 2 corresponding eigenvectors (ie. the data is largely coplanar).
The d eigenvectors can also be used to project the data into a
d dimensional space.
Define
The projection of vector is .
The corresponding scatter matrix of the
vectors is:
The matrix W maximizes the determinant of
for a given d.