|Date||Jan 21, 2011|
|Title||Bayesian non-parametric models of parsing, translation and part-of-speech induction|
Context free grammars have long been popular for modelling naturallanguage syntax and translation between human languages. However, theirunderlying independence assumptions are much too stringent. In thistalk, I will present a means for learning a richer grammar directly fromdata without resorting to any linguistic knowledge. Our approach infersa tree-substitution grammar which can use large tree fragments tobetter describe the data. Bayesian non-parametrics provide an elegantand theoretically principled way to model TSGs by incorporating astructured prior over the grammar and its productions, while integratingover uncertain events. I will present four different strands of workbuilding upon this foundation: supervised tree-bank parsing,unsupervised dependency parsing, synchronous parsing for translation andfully unsupervised part-of-speech induction. In all instances theapproach uncovers interesting latent linguistic structures andoutperforms competitive baselines.
Trevor Cohn is a lecturer in the Department of Computer Science atthe University of Sheffield, where he has been since 2009. Before thathe was a postdoctoral Research Fellow in the School of Informatics atthe University of Edinburgh following his PhD in Computer Science at theUniversity of Melbourne. He is an active researcher in the fields ofmachine learning (ML) and natural language processing (NLP), focusing ondata-driven models of human language, with applications to syntax,semantics, information extraction, summarisation and machinetranslation.