# Smooth histograms from MCMC

Iain Murray, Version 0.1, 2011-07-18.

These notes are in response to a post on Twitter on 2011-07-15:

@joezuntz Joe Zuntz #cmml Anyone know how to make nice looking 2D histogram of MCMC samples when scrappy round edge? Hard to fit model to it here as odd shape.

I will call the two variables being plotted *x* and *y*,
and assume that several other variables *z* are also being
sampled. (If only *x* and *y* were being sampled,
one wouldn't need MCMC.)

Initially we’ll assume that *x* and *y* can only
take on a discrete set of values, lying at the centres of the histogram
bins.
The conditional distribution over the grid, p(*x*,*y*|*z*), can
be computed and sampled from. This block Gibbs sampling move could be
incorporated within any Markov chain Monte Carlo (MCMC) scheme.
At equilibrium, the conditional distribution over the grid
is also an unbiased estimate of occupying each discrete (*x*,*y*)
value. Therefore, the real numbers p(*x*,*y*|*zᵢ*) can be
averaged over samples, *zᵢ*, to obtain histogram bin heights.

For a particular setting of the nuisance variables, *z*, the
conditional distribution p(*x*,*y*|*z*) often looks like a
bump around the (*x*,*y*) pair that best fits the
current *z*. This scheme looks like using a kernel density
estimator on the samples, but with the kernel tuned to the distribution of
interest in a way that doesn’t introduce error. This trick is a form of
so-called Rao–Blackwellization. For more information see, for example,
§2.2.1 p32 of my thesis.

As written, the above scheme introduces some discretization error. If
significant, this error is easily removed by resampling all of the
(*x*,*y*) values for the next proposal (except the current state)
uniformly from within their histogram cells. Alternatively the proposal
distribution can always be on a regular grid, but the current location,
which determines the centering for the grid, can be jittered within its
histogram cell with a Metropolis move.