Information theoretic criteria are mainly based on Bayes rule, Kullback-Leibler (K-L) distance, and minimum description lengths (MDL). Criteria based on Bayes rule choose the model that maximizes the probability of the data, D, given the model m and prior information I. This probability is given by
where is the parameter vector for model m, and is the standard deviation of sensor noise. The first term in the above integral is just the likelihood , and is the prior probability of and . Several Bayesian criteria have been derived depending on the choice of priors [10,11,13]. One such criteria chooses the model that maximizes [6,5](1) |
An advantage of these criteria is the ease with which they can be used. There is no problem of specifying empirical thresholds, significance levels, or the need to reference look up tables. However, these criteria require fitting all the models to the data, which may be expensive and unnecessary in many applications.
Although these criteria start from different premises, interestingly, they all appear in the form of a penalized likelihood and optimize
Further, criteria formulated using one premise have often been derived later using other premises. However, all these criteria assume data contaminated only by small scale random errors. But vision data is also contaminated with outliers. Several modifications have been recently proposed to the above criteria in order to incorporate outliers [3,8,14,15,16,17,18].