Next: Kohonen networks Up: No Title Previous: Hopfield nets --

Boltzmann machines

Boltzmann machines are discrete networks, with input, output and hidden units, in which the units update their state according to a stochastic decision rule. For a given temperature T, the probability that the unit is in the 1 state is where

Here, is as usual the unit input.

Note that T = 0 gives a step function for this probability and gives a uniform probability of .

It can be shown that after a certain time at fixed T, such networks reach a thermal equilibrium in which the probability of a given configuration of units is constant. Thus, while the precise state of the machine cannot be known, it is possible to reason about where it `probably' is. For two such configurations A and B, suppose the respective energies (measured according to the Hopfield formula) are and , and the associated probabilities and . It can further be shown that

That is, the probabilities follow a Boltzmann distribution. Note that

and so

where K is a constant depending on the interconnections, weights and temperature. Thus the state with the lowest energy has the highest probability.

If we gradually reduce T, we have a high probability of achieving a global minimum for the system energy. At high T we locate coarse minima, which are refined as T reduces. This is the important technique of simulated annealing (see Kirkpatrick, Gelatt and Vecchi).

We hope to train the weights such that when the input units are held at the appropriate 0,1 values ( clamped) the output units display the appropriate output pattern after annealing. This is done by performing the following cycle the necessary number of times;

Clamp the input and output units (the visible units) in all their corresponding patterns. For each case, anneal the system; for each use these observations to determine
Unclamp all units and anneal the system. Determine
Set

The important feature of this learning algorithm is that the update procedure for

depends only on knowledge of the behaviour of the

and

units - that is, strictly local behaviour.

This result is derived from a measure of the performance of the Boltzmann Machine, much as back-propagation was derived from a measure of the error in a layered network. Since the machine behaves `probabilistically', we need a measure that compares probability distributions; we use the Kullback measure, also called asymmetric divergence.

We suppose that the input and output (or visible) units may adopt any one of n states, . Let be the probability of state occurring, and let be the probability of observing it in the free running (unclamped) Boltzmann machine. We wish to be the same as Pr, and we measure their similarity with

It can be shown that , with equality iff for . The Boltzmann learning procedure works because

and so we are descending an `error surface',

a formula with clear similarities to the gradient descent ideas behind the back propagation learning algorithm.

The learning algorithm above is frequently presented in a simpler form for implementation as follows;

Load the machine's visible units with a pattern association pair and anneal. Record which pairs ij are simultaneously 1.
Repeat for all pattern pairs.
Determine probability of pair ij being simultaneously 1 `under environmental control'; set
Anneal the machine unconstrained; for pairs ij which are simultaneously 1, set

Note this is achieving the same thing as the formal presentation above. The first stage encourages the machine in a Hebbian sense to do what we wish, while the second encourages it in an anti-Hebbian sense to `forget' its natural behaviour.

The annealing schedule is of critical importance, as is the choice of the learning constant . At high T, too many states are accessible, and at low T too few. Which T are chosen, and the rate of cooling, will control the success of the machine. Bounds has shown for a specific application what the necessary range for T is.

Next: Kohonen networks Up: No Title Previous: Hopfield nets --

Bob Fisher
Mon Aug 4 14:24:13 BST 1997