Bayesian probabilistic learning in robots
Narayanan U. Edakunni
Traditionally robots have been controlled by using a parametric
model of the dynamics of the robot, taking into account various
characteristics of the robot like mass of the plant, moment of inertia and so
on. This way of controlling a robot is very different from the one that humans
employ. The dynamics of the human body is learnt instead of being hardwired in
the brain. Human babies try to experiment with the motion of their body parts
in order to establish the mapping between the torque and the displacement that
the torque produces. By learning the dynamics of the body on the fly, it is
possible to quickly adapt to changes in the characteristics of the plant.
In recent years, with the development of fast and accurate machine learning
methods, it has been possible to learn the dynamics of a robot efficiently. When
learning the dynamics, a robot must use a learning algorithm that is able to
process the data and learn in real time. The robot tries to learn the mapping
between the command applied and the resulting change in the state of the
robot. When a new state change is desired the learnt mapping is used to predict
the command that needs to be applied to reach the state. Hence we see that
the operation of the robot happens simultaneously with learning. A robot
operating in the real world needs to interact with its
environment in real time which results in a constantly changing state of the
robot.
This makes it important for the robot to learn from a single data
point at a time without having to store away the data. This
form of learning where data points are discarded after learning
is termed as online learning. The tight
constraints on memory and computational resources imposed by this form of
learning leads to difficulties in formulating such online learning algorithms.
One of the successful algorithms that meets these requirements is the
Locally Weighted Projection Regression (LWPR)[1] which tries to learn the
underlying mapping between the state of the robot and the command that produced
it using a weighted local linear approximation. This
algorithm has the nice characteristics of being able
to operate in real time with limited memory. Furthermore, unlike conventional
localised learning algorithms LWPR learns the local models independent of each
other, which makes it robust against negative interference[2,3].
The independent
formulation also lends it a distributed learning capability; being able to learn
different parts of a function simultaneously. LWPR's success has been
demonstrated by modelling such complex tasks as juggling in robots[4].
One of the
drawbacks of LWPR is that it has a lot of open parameters that need to be tuned
to obtain the optimal learning behaviour. This is different from behaviour
in humans who do not need to fix open parameters. The solution to this lies in
Bayesian learning. According to the Bayesian school of thought, parameters
of a model have an associated probability distribution which avoids the
need to tune the parameters.
Bayesian probabilistic learning has gained popularity in recent
years largely due to the ease of modelling and the model selection
capabilities that it offers. The Bayesian probabilistic framework also has the
property that it models the belief revision system of humans quite closely[5].
In a Bayesian framework we start with a prior belief of an event and after
observing the event
the belief is revised to reflect the experience. The belief is represented as
probabilities and the posterior probability is a product of the prior
probability (prior belief) and likelihood (experience). Schematic in the figure 1 illustrates this process.
Fig. 1 : Illustrates the process of probability update during Bayesian online learning
The Bayesian framework can hence be used effectively to model the dynamics of a robot.
We start with a certain prior probability of the dynamics model. We then apply
a random command to the system (usually called motor babbling), observe the
change of state this causes and accordingly come up with our posterior belief
about the dynamics of the system. This posterior belief then serves as the
prior for the next step of learning. This way the model of dynamics or equivalently the
parameters of the model gets updated at each time step to reflect the
experience so far[6].
The Randomly Varying Coefficient (RVC) model[7] is a probabilistic model based on this
paradigm of Bayesian probabilistic online learning that combines the virtues of
a Bayesian probabilistic framework and LWPR. In RVC the idea of independent
localised learning is reformulated as a Bayesian probabilistic model thus
resulting in an efficient yet robust learning algorithm suited for real
time learning tasks like the one shown in Fig. 2. The figure
illustrates an experiment where a composite
controller consisting of a combination of feedforward and low gain feedback
commands was used to control the joints of a simulated DLR arm. The feedforward
command in turn was produced from an inverse dynamics model of the
robot arm learnt online by an RVC model. The aim of the experiment
was to learn an accurate model for the inverse dynamics of the robot
arm online, while the robot performed a figure-8 pattern. The result
of the experiment is shown in Fig. 2(b) where the pattern
performed by the robot is shown at different stages of learning. The
speedy convergence of the tracking establishes the effectiveness of the
RVC method in real time learning.


Fig.2 : (a) The DLR arm © http://www.dlr.de (b) Learning of the dynamics of a simulated DLR arm
using RVC. The tracking performance after different iterations of learning is
shown. Traj_des represents the desired trajectory. Traj_pd represents a PD
controller without any model of the dynamics.
Traj_50 and Traj_200000 represents the tracking performance of model learnt by
RVC after training over 50 and 200000 data points respectively.
References :
- Incremental Online Learning in High Dimensions, Sethu
Vijayakumar, Aaron D'Souza and Stefan Schaal, Neural Computation, vol.
17, no. 12, pp. 2602-2634, 2005
- Catastrophic Interference in Human Motor Learning,
Tom Brashers-Krug and Reza Shadmehr and Emanuel Todorov, Advances in
Neural Information Processing Systems, Vol. 7, 1995.
- Constructive Incremental Learning from Only Local
Information, Stefan Schaal and Christopher G. Atkeson, Neural
Computation, Vol 10, 2047-2084, 1998
- http://www-clmc.usc.edu/Resources/ResourcesMoviesRobotLearningOfMotorControl
- Artificial Intelligence Dialects of the Bayesian
Belief Revision Language, Shimon Schocken and Paul R. Kleindorfer, IEEE
Transactions on Systems, Man, and Cybernetics, Vol.19, No. 5, 1989
- A Bayesian approach to on-line learning, On-line learning in neural networks,363 - 378, 1999.
- Kernel Carpentry for Online Regression using Randomly
Varying Coefficient Model, Proc. International Joint Conference on
Artificial Intelligence (IJCAI '07), Hyderabad, India, 2007