A simple, not particularly useful, yet illustrative example, due to Hinton.
Input patterns are twelve bits long - they consist of six zeroes, while the
remaining bits are 1-x-x-x-x-1, where x is chosen arbitrarily; there are
thus 16 such patterns.
We can position these 6 bits anywhere (cyclically) in the input stream,
providing possible inputs.
The aim is to determine which of the sixteen patterns is on view in any
input - that is, ``Translation Invariant Recognition''.
An obvious numbering might be;
We employ 12 input units (one per bit) and 16 output units; we thus expect the output pattern to be, say, 1-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0 for each of the input patterns
1-0-0-0-0-1-0-0-0-0-0-0 0-1-0-0-0-0-1-0-0-0-0-0 0-0-1-0-0-0-0-1-0-0-0-0 0-0-0-1-0-0-0-0-1-0-0-0 etc.and to be 0-1-0-0-0-0-0-0-0-0-0-0-0-0-0-0 for each of the input patterns
1-0-0-0-1-1-0-0-0-0-0-0 0-1-0-0-0-1-1-0-0-0-0-0 0-0-1-0-0-0-1-1-0-0-0-0 0-0-0-1-0-0-0-1-1-0-0-0 etc.and so on.
The solution network has two hidden layers; the first has 60 units and the second 6. Units in the 60-wide layer are each connected to 6 input units, and to all 6 in the second hidden layer. The units in the 6-wide hidden layer are each connected to all the output units.
The weights are initialised with noise and trained over many thousands of data passes with a subset of the input patterns; the resulting network recognises hitherto unseen inputs with precision. The second hidden layer learned a canonical representation of the input patterns.