Smooth histograms from MCMC
Iain Murray, Version 0.1, 2011-07-18.
These notes are in response to a post on Twitter on 2011-07-15:
@joezuntz Joe Zuntz #cmml Anyone know how to make nice looking 2D histogram of MCMC samples when scrappy round edge? Hard to fit model to it here as odd shape.
I will call the two variables being plotted x and y, and assume that several other variables z are also being sampled. (If only x and y were being sampled, one wouldn't need MCMC.)
Initially we’ll assume that x and y can only take on a discrete set of values, lying at the centres of the histogram bins. The conditional distribution over the grid, p(x,y|z), can be computed and sampled from. This block Gibbs sampling move could be incorporated within any Markov chain Monte Carlo (MCMC) scheme. At equilibrium, the conditional distribution over the grid is also an unbiased estimate of occupying each discrete (x,y) value. Therefore, the real numbers p(x,y|zᵢ) can be averaged over samples, zᵢ, to obtain histogram bin heights.
For a particular setting of the nuisance variables, z, the conditional distribution p(x,y|z) often looks like a bump around the (x,y) pair that best fits the current z. This scheme looks like using a kernel density estimator on the samples, but with the kernel tuned to the distribution of interest in a way that doesn’t introduce error. This trick is a form of so-called Rao–Blackwellization. For more information see, for example, §2.2.1 p32 of my thesis.
As written, the above scheme introduces some discretization error. If significant, this error is easily removed by resampling all of the (x,y) values for the next proposal (except the current state) uniformly from within their histogram cells. Alternatively the proposal distribution can always be on a regular grid, but the current location, which determines the centering for the grid, can be jittered within its histogram cell with a Metropolis move.