The method

If the two distributions are related functionnally (in our case through a geometrical transformation), and we only have an estimate of this transformation, the mutual information depends on
  • The first distribution.
  • The second distribution.
  • The estimated transformation that maps one onto the other.
  • The exact transformation that maps the first image u (the model) onto the second image v (the image) should give rise to the largest mutual information. Mutual information then becomes an optimisation criterion, optimised w.r.t. T:

        MI(T) = H( u(X) ) + H( v(T(X)) ) - H( u(X), v(T(X)) )
     

    The method then proceeds by a classic gradient-descent optimisation technique, and tries to find  the  transformation that gives  the largest mutual information by taking small steps in the "direction" of the derivative of the criterion.

            d/dT[MI(T)] = d/dT[ H( v(T(X)) ) ] - d/dT[ H( u(X), v(T(X)) )]
    The density  probabilities of both  images are  estimated using  the Parzen Window technique. This is a classical technique used in neural networks for estimating a probability distribution function (pdf) from a sample. It estimates the pdf using radial  basis functions. 
     
    Since we need to estimate    the density probability function    at some point, say x,  we draw A  points at random in  the model (data points) and compute the Parzen density estimation. Then we evaluate on a different point in  the model, using the same  data points.  We actually perform this scheme on B different  points. This set  of (B) points is the set of the centres of the radial basis functions.
    Taking the mean of the log  of these B measures  leads directly to the estimation of the entropy in the image.  To access the entropy in both the image  and the joint realisation  of the model   and the image, we just  have to  apply the  same evaluation scheme  (having computed the transforms of the random points)