Suppose a set of input patterns
is associated with a set of output patterns
for
, with the input loaded onto the m inputs of an
array of n generalised perceptrons, whose transfer function is the
sigmoid
The performance of the network may be measured by the error
where is the state of the
unit when the
input pattern is
.
E is non-negative, being zero when the network performs `perfectly',
`small' for `good' performance and `large' for `poor' performance.
We note that E depends on the values of the weights, and may hope therefore
to reduce it by performing a gradient descent
iteratively (where, as usual, is the weight on the
input to unit j.
is easily determined (by differentiating
the unit input formula and 3);
For such simple networks (no intermediate units), we are guaranteed to
minimise E if we take small enough and perform enough
iterations (although this minimum may well not be 0).