Estimating the Error on an Unseen Datapoint

We are interested in the ability of the model to predict unseen datapoints. We need to adapt our error estimation to reflect this. The usual method of unbiased prediction is to divide each element of the covariance matrix by (N-1) instead of N. We have found that, although this method correctly predicts the probable error, it can not be used to select the correct model complexity.

An alternative way to make tex2html_wrap_inline470 an unbiased estimate is to estimate each covariance matrices with one datapoint left out in turn:


 equation137

tex2html_wrap_inline472 is now given by:


equation151

We can significantly speed up the computation by using a recursive estimation of covariance [1].

As we want to approximate the error on an unseen datapoint, we add a value of tex2html_wrap_inline436 to the tex2html_wrap_inline476 diagonal of tex2html_wrap_inline478:


 equation164

where tex2html_wrap_inline480 is the Dirac delta function (1 if tex2html_wrap_inline482, 0 otherwise).

Figure 5 shows the ability of the model to predict unseen data. The figure was generated using unseen_error_estimation.m.

  figure181
Figure 5: Unseen error prediction on tex2html_wrap_inline462 with an error of tex2html_wrap_inline464 on each point