SpeakerLaura Rimell
DateNov 18, 2011
Time11:00AM 12:30PM
LocationChrystal Macmillan Building, Seminar Room One
TitleMulti-way Tensor Factorization for the Unsupervised Induction of Subcategorization Frames
Abstract

Subcategorization information can benefit any application that requires information about predicate-argument structure, including parsing, semantic role labeling, verb clustering, information extraction, and machine translation. The ability to acquire verb subcategorization frame (SCF) information automatically and with minimal supervision is a key goal for being able to build useful resources quickly, especially for new languages and domains.

This talk introduces a novel method for fully unsupervised verb SCF induction. Treating SCFs as a multi-way co-occurrence problem, we use multi-way tensor factorization to cluster frequent verbs from a large corpus according to their syntactic behaviour. The SCF lexicon that emerges from the clusters is shown to have an F-score of 72 when evaluated against a gold standard, not far below a method that relies on hand-crafted rules. Moreover, the tensor factorization method is shown to reveal latent syntactic and semantic structure in the data, opening the possibility of extracting more fine-grained SCFs that take semantics into account. It also has the advantage of being able to learn from grammatical relations not explicitly represented in the SCFs, such as modifiers and subtypes of clausal complements. We investigate a variety of features for the task. Joint work with Tim Van de Cruys, Anna Korhonen, and Thierry Poibeau.

BioLaura Rimell is a Research Associate at the Department for Theoretical Linguistics and the Computer Laboratory, University of Cambridge. She is currently working on PANACEA, an EU FP7 project which aims to automate the acquisition, production, updating and maintenance of language resources required by MT and other language technologies. Her work focuses on subcategorization frame acquisition. She has previously worked on CCG parsing, domain adaptation, parser evaluation, and verbal event and argument structure.

Previous Next

List