|Date||Mar 16, 2012|
|Title||Inducing Shallow Semantic Representations with Little or No Supervision|
Inducing meaning representations from text is one of thekey objectives of NLP. Most of existing statistical techniques fortackling this problem rely on large human-annotated datasets, which areexpensive to create and exist only for a very limited number oflanguages. Even then, they are not very robust, cover only a smallproportion of semantic constructions appearing in the labeled data, andare domain-dependent. In this work, we investigate Bayesian modelswhich do not use any labeled data but induce shallow semanticrepresentations from unannotated texts. Unlike semantically-annotateddata, unannotated texts are plentiful and available for many languagesand many domains which makes our approach particularly promising. Weevaluate our approach in two set-ups. First, we experiment with thePropBank corpus, where it achieves the best reported results amongunsupervised approaches, then we evaluate it on a question-answeringtask for the biomedical domain, where it also shows competitiveperformance. We also look into several extensions of the model, andspecifically consider multilingual induction of semantics, where weshow that multilingual parallel data provides a valuable source ofindirect supervision for induction of shallow semantic representations.
Joint work with Alexandre Klementiev.
Ivan Titov joined the Saarland University as a juniorfaculty and head of a research group in November 2009, following apostdoc at the University of Illinois at Urbana-Champaign. He receivedhis Ph.D. in Computer Science from the University of Geneva in 2008 andhis master's degree in Applied Mathematics and Informatics from the St.Petersburg State Polytechnic University (Russia) in 2003.
His current research interests are in statistical naturallanguage processing (models of syntax, semantics and sentiment) andmachine learning (structured prediction methods, latent variablemodels, Bayesian methods).