Issues with the orthodox statistical test
The basic logic of the orthodox statistical test involves defining a null hypothesis H0, deriving a p-value - which is the probability of obtaining data as extreme as that actually observed, given H0 - and rejecting H0 if the p-value is small.
There are a number of problems with this approach:
- What counts as a 'small probability' is a rather arbitrary judgement. The convention is 'less than .05' but there is no particular justification.
- H0 is usually precisely specified (e.g. a parameter such as difference betwween means is zero) and as a consequence is almost certainly false. If H0 is false the p-value will decrease as more data is collected. The same p-value for a larger n represents a weaker effect. The use of a fixed alpha (critical rejection value) assumes that n was fixed before the data was collected, but experimenters do not always adhere to this.
- If H0 is true, increasing n will not affect the distribution of p-values. You can only get more evidence to reject H0, not to accept it.
- There is often of scientific interest in showing no effect or zero difference, i.e. accepting H0 - discovery of invariances is important, but the orthodox approach does not support it.
- The p-value in a sense depends on data that was not actually observed, i.e. the 'more extreme values' that could have occured.
- The p-value is P(data|H0) but is frequently (mis-)interpreted as P(H0|data).
A solution to most of these problems is to actually calculate P(H0|data) using Bayes theorem, i.e. P(H0|data) = P(data|H0)P(H0)/P(data). See the references linked from the main page for details of how this can be done in practice, and for expanded discussion of each of the points above.