Computational Theory of Incremental and Active Learning for Optimal Generalization

Sethu Vijayakumar

Abstract of Doctoral thesis-95D38108, Tokyo Institute of Technology, January 1998.

One of the outstanding characteristics of biological systems is their ability to learn, in particular, to learn incrementally in real time from a multitude of sensory inputs. The computational theory developed here tries to move towards the aim of equipping an artificial system of even moderate complexity with a 'black box' learning system which can perform as autonomously and robustly as their biological counterpart.

We have reviewed the functional analytic approach to learning in neural networks in Chapter 2 and laid out the framework for learning based on an analysis at the level of function spaces. Various optimization criterion for learning were considered and importance of using a criterion which reduces error in the original function space rather than the sampled space was emphasized in accordance with the work of Ogawa et al.

We have discussed the problems of model selection and the bias variance dilemma which comes up during the selection of the function search space in Chapter 3 and have shown that this framework is well suited for use with standard model selection strategies.

The definition and need for exact incremental learning is provided in Chapter 4. Methods for incrementally computing the Wiener learning operator and the learned function using only the newly available training data and the results of the learning carried out so far are provided. The general results for other optimization criterion, done as an extension of this work, is also provided here. A concrete pseudocode for implementation of the incremental learning algorithm and simulation results proving the effectiveness of the algorithm are also shown. The important point to be noted is that the incremental learning results are exact in the sense that they exactly coincide with the results of batch learning on the entire data set. An example of using this framework for learning in real world problems (here, the sensorimotor map of a 2 DOF robot arm) by incorporating apriori knowledge effectively is shown in Chapter 5.

During the course of the dissertation, we have emphasized the importance and need of not just using the given training data but dynamically selecting the training data with a view to improving generalization ability. Chapter 6 provides a method of selecting the optimal training data, referred to as the active data selection, for the Wiener optimization criterion.

Techniques described under exact incremental learning work surprisingly well under the presence of strong apriori knowledge and are particularly useful in problems where the dimensionality is not too high. But when we consider learning in high dimensional spaces without strong apriori knowledge, this and most other parametric as well as non-parametric methods break down due to the complexity. However, we found evidence that in problems of motor control, the data generated by biological and artificial movements systems, though being high dimensional and sparse globally, usually are distributed densely on a low dimensional hyperplane locally. In order to exploit this, Chapter 7 compared different techniques for local dimensionality reduction based on Monte Carlo evaluation and found a suitable candidate in Locally Weighted PCA (LWPCA).

The LWPCA dimensionality reduction is used on top of a locally weighted regression to implement the Local Adaptive Subspace Regression (LASS). LASS is an incremental learning algorithm that uses no apriori knowledge, is completely incremental in allocating resources as well as in incorporating new data and avoids competition between local modules to prevent negative interference. LASS was tested in Chapter 8 and for artificial, robot and human motion data, it was shown to effectively detect and exploit low dimensional local manifolds from high dimensional data and achieve excellent learning results.

To summarize, in the presence of strong apriori knowledge and for optimal generalization, we recommend using the exact incremental learning technique developed in Part I of the dissertation. Active data selection techniques can be used to obtain optimal generalization with minimal training data. However, in the absence of apriori knowledge and especially, for learning in high dimensional spaces, LASS is a very effective alternative for obtaining good approximations to the real solution.
Click here to download a compressed (gzip-ed) version of the thesis (180 pages). Formatted for double side book style printing.