Robust Semantic Interpretation (ROSIE)

I was a PI on a joint project between Edinburgh and Stanford called ROSIE. This was funded by Scottish Enterprise, from 2001--2005.

The aim was to improve the state of the art in the robust and accurate interpretation of dialogue. Part of this work involved building, in collaboration with Jason Baldridge, the first dialogue parser that automatically learns how to assign a discourse structure to dialogues from the scheduling domain. The discourse structures that are assigned to the dialogues allow one to construct deterministically a logical form to the dialogue in the style of SDRT, with the sentences being assigned their logical forms as chosen through the Redwoods corpus (gold standard) or through existing statistical parse selection models for the English Resource Grammar. The result is a statistical interpretation model of scheduling dialogues whose output has a truth conditional interpretation that stems from (a) the ERG (since in the ERG all parses are assigned a compositional semantic interpretation) and (b) the rhetorical relations that appear in the discourse structure. The latter feature allows one to compute aspects of meaning that go beyond the grammar, such as the underlying goal of the dialogues, and resolve anaphoric terms such as 3pm (e.g., to 3pm on 26/05/04).

We are exploring the ways in which active learning can speed up the annotation process. Within the realm of parsing, we have found that by using active learning one can achieve results on the parse selection task with half the training data that one would need to achieve similar results when the training examples are chosen randomly.

A part of this work also involved developing new models of semantic underspecification. In collaboration with Ann Copestake and Dan Flickinger, I devised a constraint-based approach to constructing underspecified logical forms on the syntax/semantics interface that is more constrained than the lambda calculus. This is implemented in the grammar development environments and parsing and generation platforms provided by the Delphin Framework. More recently, in collaboration with Ann Copestake and Alexander Koller, I have helped to design the syntax, semantics and proof theory for a formal language that is maximally flexible in the type of semantic information that can be left underspecified: it can express not only the standard underspecified information about semantic scope and antecedents to anaphora, but in addition it allows one to underspecify the arity of predicates, the arguments they take, and the argument position of a variable to a predicate. This makes the language ideal for building semantic components to shallow language processors (from POS taggers to intermediate statistical parsers), where information about syntactic dependencies and/or lexical subcategorisation may be missing. My research on gesture draws on this work on underspecified semantics.