As we have mentioned previously, one way of de-correlating probabilities is to use a model. Take for example a set of data described by the function where a defines the set of free parameters defining f and is the generating data set. If we now define the variation of the observed measurements about the generating function with some random error we can see that the probability will be equivalent to as the model and generation point completely define all but the random error.
Choosing Gaussian random errors with a standard deviation of gives
where is a normalisation constant. We can now construct the maximum likelihood function
which leads to the definition of log likelihood
This expression can be maximized as a function of the parameters a and this process is generally called a least squares fit. Whenever you encounter least squares there is therefore a built in assumption of independence and Gaussian distribution. In practical situations the validity of these assumptions should be checked by plotting the distribution of to make sure that it is Gaussian.
The choice of a least squares error metric gives many advantages in terms of computational simplicity and later we will see that it is also used extensively for definitions of error covariance and optimal combination of data. However, the distribution of random variation on the observed data X is something that generally we have no initial control over and could well be arbitrary. This may initially be seen as an overwhelming problem but in most circumstances it is possible to make distributions handleable (Gaussian) by transformation and , where g is chosen so that the initial distribution of maps to a Gaussian distribution in g .
One good example of this is in the location of a known object in 3D data derived from a stereo vision system. In the coordinate system where the viewing direction corresponds to the z axis, x and y measures have errors determined by image plane measurement. However, the depth for a given point is given by
where I is the interoccular separation, f is the focal length and and are image plane measurements. Attempts to perform a least squares fit directly in space results in instability due to the non-Gaussian nature of the distribution. However, transformation to yields Gaussian distributions and good results.
Under many circumstances, even after taking care to obtain Gaussian variation on the fitted quantities, there is still one final problem which needs to be addressed. This is the problem of fliers or outliers. Fliers are the name given to the data generated by any real system which do not conform to the assumed statistical distribution. These are generally caused by complete failure of the data measurement system and generated well away from the expected mean of the distribution. If ignored they can completely dominate the fitting process giving meaningless results. For example, measurement of the distance to an object pre-supposes that we have selected the correct object. The correct way to deal with these measures is to modify the expected probability distribution to include the long tails from fliers, this leads to the branch of numerical methods known as robust statistics. The simplest way to do this which allows us to continue to use standard methods for covariance estimation and optimal data combination, which assume Gaussian distribution, is to limit the contribution to the distribution from any data point to some maximum value . This makes the assumption that the statistical distribution is constant for any gearing point greater than from the expected position. Unfortunately this process precludes the use of standard least squares solution methods and solution must generally be iterative as the gearing point will vary for each data point during parameter estimation. This process is efficiently executed by the probabilistic Hough transform for small numbers of parameters .