### Exact inference using Belief Propagation

Directed acyclic probabilistic graphs (belief networks) can be used to specify conditional independence
relationships between random variables. These conditional independence relationships can be
used to simplify the process of inference in belief networks.

Working with belief networks is a three stage process. Firstly the
network is set up, and the probabilistic relationships between nodes
is specified to represent the form of prior distribution which is
assumed. Secondly data is received. This data gives information about
some of the nodes of the network. It is used to `instantiate' those
nodes, by fixing them to their observed value. Lastly, conditioning is
performed: the probability distribution of the other nodes is
calculated conditioned on the fixed value of the instantiated nodes.

Computationally speaking, the calculation of this conditional
distribution is not straightforward. However in special situations the
conditioning process can be simplified. Pearl observed that in singly
connected (polytree) belief networks inference can be done with
computational complexity that is linear in the number of nodes.

### Approximation Methods

Very often though the underlying graph is not singly connected. What
can be done in these situations? For simple cases, a junction tree
algorithm can be used to produce a polytree of graph cliques. However
this scales badly in the size of the largest clique in the
(triangulated) graph. In many realistic situations this is very large.

For most realistic scenarios people resort to approximation
schemes. Some of the schemes considered here include Monte Carlo
approaches, maximum posterior approximations, parametric approximations, variational methods and loopy propagation. Each of these
has their advantages and disadvantages, and it is important to compare
them, and to learn more about the value of each.

In most cases the work done here focusses on the use of these
techniques for dynamic trees. Junction tree methods are impossible for
dynamic trees because the largest clique is of the order of the
network size.