The method
If the two distributions are related functionnally
(in our case through a geometrical transformation), and we only have an
estimate of this transformation, the mutual information depends on
The exact transformation that maps the first
image u (the model) onto the second image v (the image) should
give rise to the largest mutual information. Mutual information then becomes
an optimisation criterion, optimised w.r.t. T:
MI(T) = H( u(X) ) + H( v(T(X)) ) - H( u(X), v(T(X)) )
The method then proceeds by a classic gradient-descent
optimisation technique, and tries to find the transformation
that gives the largest mutual information by taking small steps in
the "direction" of the derivative of the criterion.
d/dT[MI(T)] = d/dT[ H( v(T(X)) ) ] - d/dT[ H( u(X), v(T(X)) )]
The density
probabilities of both images are estimated using the
Parzen
Window technique. This is a classical technique used in neural networks
for estimating a probability distribution function (pdf) from a sample.
It estimates the pdf using radial basis functions. |
Since we need to estimate
the density probability function at some point, say x,
we draw A points at random in the model (data points) and compute
the Parzen density estimation. Then we evaluate on a different point in
the model, using the same data points. We actually perform
this scheme on B different points. This set of (B) points is
the set of the centres of the radial basis functions.
Taking the mean of the log of these
B measures leads directly to the estimation of the entropy in the
image. To access the entropy in both the image and the joint
realisation of the model and the image, we just
have to apply the same evaluation scheme (having computed
the transforms of the random points)