One way ANOVA - Analysis of variance

Analysis of variance (ANalysis Of VAriance) is a general method for studying sampled-data relationships [1,2].
The method enables the difference between two or more sample means to be analysed, achieved by subdividing the total sum of squares.
One way ANOVA is the simplest case. The purpose is to test for significant differences between class means, and this is done by analysing the variances. Incidentally, if we are only comparing two different means then the method is the same as the for independent samples. The basis of ANOVA is the partitioning of sums of squares into between-class () and within-class (). It enables all classes to be compared with each other simultaneously rather than individually; it assumes that the samples are normally distributed.
The one way analysis is calculated in three steps, first the sum of squares for all samples, then the within class and between class cases. For each stage the degrees of freedom are also determined, where is the number of independent `pieces of information' that go into the estimate of a parameter. These calculations are used via the Fisher statistic to analyse the null hypothesis. The null hypothesis states that there are no differences between means of different classes, suggesting that the variance of the within-class samples should be identical to that of the between-class samples (resulting in no between-class discrimination capability). It must however be noted that small sample sets will produce random fluctuations due to the assumption of a normal distribution.
If is the sample for the class and data point then
the total sum of squares is defined as:

(1) |

(2) |

(3) |

(4) |

(5) |

(6) |

(7) |

(8) |

(9) |

(10) |

(11) |

(12) |

The value gives a reliable test for the null hypothesis, but it cannot indicate which of the means is responsible for a significantly low probability. To investigate the cause of rejection of the null hypothesis post-hoc or multiple comparison tests can be used. These examine or compare more than one pair of means simultaneously.
Here we use the Scheffe post-hoc test. This tests all pairs for differences between means and all possible combinations of means. The test statistic Scheffe post-hoc test value is:

(13) |

In terms of classification, large statistic values do not necessarily indicate useful features. They only indicate a well spread feature space, which for a large dataset is a positive attribute. It suggests that the feature has scope or `room' for more classes to be added to the dataset. Equally, features with smaller values (but greater than the critical values ) may separate a portion of the dataset, which was previously `confused' with another portion. Adding this new feature, increasing the feature space dimensions, may prove beneficial. In this manner, features which appear `less good' (i.e. lower statistic values than alternative features) may, in fact, prove useful in terms of classification.