In this section, we give some reminders on the definition and basic properties of Markov chains defined on countable state spaces. For an extension to general state spaces, the interested reader is referred to [9] and [5].
We consider a mesurable state space . A Markov chain with values in F is a discrete time stochastic process with the property that the distribution of given all previous values of the process only depends upon . Typically, the Markov chain Z takes values in . However, to illustrate the main ideas, we restrict our attention to discrete state spaces (i.e. can take only a finite or countable number of different values).
The transition between two states z and z' is defined by the transition kernel K:
For fixed , K(z,.) is a probability on
A central point in the study of Markov chains is the behaviour of the chain as k grows, i.e. for large values of k can we consider to be drawn from a probability distribution , whatever the starting value ? If it is so, we say that the chain is ergodic with as stationary distribution. A necessary condition for the chain to have as stationary distribution is the invariance of under the transition kernel K:
A sufficient condition for the invariance of under the transition kernel K is the following reversibility condition:
Two notions precise the behaviour of the chain: the irreducibility and the aperiodicity. Is is beyond the scope of this tutorial to describe in details these properties, but we will review the ideas behind them. A chain is said to be irreducible if each state can communicate with every other one, i.e. for every z and y, there exists k>0 such that . An irreducible chain is said to be (strongly) aperiodic if for some state z the probability of remaining in z is strictly positive: K(z,z)>0. This prevents the chain from having a cyclic behaviour.
We finally have the following convergence theorem [9]:
Therefore, the problem of sampling from a probability distribution amounts to building a transition kernel K which verifies the invariance relation 2. In section 3, we give two examples of sampling algorithms which use different types of transition kernel.