PhD thesis of Verena Rieser
Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data
PDF (4.1MB) Ph.D. thesis, Saarland University, Department of Computational Linguistics, 2008.This thesis was supervised by: Dr. Oliver Lemon (University of Edinburgh) and Prof. Dr. Manfred Pinkal (Saarland University).
It is published in the Saarbruecken Dissertations in Computational Linguistics and Language Technology and can be ordered here.
Abstract
In my PhD thesis, I develop a framework to optimise multimodal dialogue strategies from small amounts of Wizard-of-Oz (WOZ) data.Designing a spoken dialogue system can be a time-consuming and challenging process. To facilitate strategy development, recent research investigates the use of Reinforcement Learning (RL) methods applied to automatic dialogue strategy optimisation from real data. For new application domains where a system is designed from scratch, however, there is often no suitable in-domain data available, leaving the developer with a classic chicken-and-egg problem.
This thesis proposes to learn dialogue strategies by simulation-based RL, where the simulated environment is learned from small amounts of Wizard-of-Oz data. Using WOZ data rather than data from real Human- Computer Interaction allows us to learn optimal strategies for new application areas beyond the scope of existing dialogue systems. Optimised learned strategies are then available from the first moment of online-operation, and tedious handcrafting of dialogue strategies is fully omitted. We call this method `bootstrapping'.
Our results show that a dialogue policy constructed using this framework significantly outperforms a non-optimised data-driven policy (constructed via Supervised Learning) in in terms of subjective user ratings and objective dialogue performance measures. For example, RL leads to an almost 50% increase in perceived Task Ease and almost 20% increase in Future Use.
The technical contributions of this thesis are new methods and
techniques introduced to learn a simulated learning environment from
small amounts of WOZ data. For example, a new method to learn and
evaluate user simulations, and non-linear reward functions are
introduced. The overall contribution is an end-to-end data-driven
framework to design and evaluate RL-based dialogue strategies -- from
data collection to user testing.

