Computational Theory of Incremental and Active Learning for Optimal
Generalization
Sethu Vijayakumar
Abstract of Doctoral thesis-95D38108, Tokyo Institute of Technology, January 1998.
One of the outstanding characteristics of biological systems is their ability
to learn, in particular, to learn incrementally in real time from a multitude
of sensory inputs. The computational theory developed here tries to move
towards the aim of equipping an artificial system of even moderate complexity
with a 'black box' learning system which can perform as autonomously and
robustly as their biological counterpart.
We have reviewed the functional analytic approach to learning in
neural networks in Chapter 2 and laid out the framework for learning
based on an analysis at the level of function spaces. Various
optimization criterion for learning were considered and importance of
using a criterion which reduces error in the original function space rather
than the sampled space was emphasized in accordance with the work of
Ogawa et al.
We have discussed the problems of model selection and the bias variance
dilemma which comes up during the selection of the function search space in
Chapter 3 and have shown that this framework is well
suited for use with standard model selection strategies.
The definition and need for exact incremental learning is provided
in Chapter 4. Methods for incrementally computing the Wiener
learning operator and the learned function using only the newly available
training data and the results of the learning carried out so far are provided.
The general results for other optimization criterion, done as an extension of
this work, is also provided here. A concrete pseudocode for
implementation of the incremental learning algorithm and simulation results
proving the effectiveness of the algorithm are also shown.
The important point to be noted is that the incremental learning results
are exact in the sense that they exactly coincide with the results of
batch learning on the entire data set.
An example of using this framework for learning in real world problems
(here, the sensorimotor map of a 2 DOF robot arm) by incorporating apriori
knowledge effectively is shown in Chapter 5.
During the course of the dissertation, we have emphasized the
importance and need of not just using the given training data but
dynamically selecting the training data with a view to improving
generalization ability. Chapter 6 provides a method
of selecting the optimal training data, referred to as the
active data selection, for the Wiener optimization criterion.
Techniques described under exact incremental learning work surprisingly
well under the presence of strong apriori knowledge and are particularly
useful in problems where the dimensionality is not too high. But when we
consider learning in high dimensional spaces without strong apriori knowledge,
this and most other parametric as well as non-parametric methods break down
due to the complexity. However, we found evidence that
in problems of motor control, the data generated by biological and artificial
movements systems, though being high dimensional and sparse globally,
usually are distributed densely on a low dimensional hyperplane locally.
In order to exploit this, Chapter 7 compared different
techniques for local dimensionality reduction based on Monte Carlo evaluation
and found a suitable candidate in Locally Weighted PCA (LWPCA).
The LWPCA dimensionality reduction is used on top of a locally
weighted regression to implement the Local Adaptive Subspace Regression
(LASS). LASS is an incremental learning algorithm that uses no apriori
knowledge, is completely incremental in allocating resources as well as
in incorporating new data and avoids competition between local modules
to prevent negative interference. LASS was tested in Chapter 8 and for
artificial, robot and human motion data, it was shown to effectively detect
and exploit low dimensional local manifolds from high dimensional data and
achieve excellent learning results.
To summarize, in the presence of strong apriori knowledge and for
optimal generalization, we recommend using the exact incremental
learning technique developed in Part I of the dissertation. Active data
selection techniques can be used to obtain optimal generalization with
minimal training data. However, in the absence of apriori knowledge and
especially, for learning in high dimensional spaces, LASS is a very
effective alternative for obtaining good approximations to the real solution.
Click here to download a compressed (gzip-ed) version of the thesis (180 pages). Formatted for double side book style printing.