|Date||Mar 01, 2013|
|Title||Relation Extraction with Matrix Factorization and Universal Schemas|
The ambiguity and variability of language makes it difficultfor computers to analyse, mine, and base decisions on. This hasmotivated machine reading: automatically converting text into semanticrepresentations. At the heart of machine reading is relation extraction:predicting relations between entities, such asemployeeOf(Person,Company). Machine learning approaches to this taskrequire either manual annotation or, for distant supervision, existingdatabases of the same schema (=set of relations). Yet, for manyinteresting questions (who criticised whom?) pre-existing databases andschemas are insufficient. For example, there is nocritized(Person,Person) relation in Freebase. Moreover, the incompletenature of any schema severely limits any global reasoning we could useto improve our extractions.
In this talk I will first present some earlier work we havedone in distantly supervised extraction. Then I will show that the needfor pre-existing datasets can be avoided by using, what we call, a"universal schema": the union of all involved schemas (surface formpredicates such as "X-was-criticized-by-Y", and relations in the schemasof pre-existing databases). This extended schema allows us to answernew questions not yet supported by any structured schema, and to answerold questions more accurately. For example, if we learn to accuratelypredict the surface form relation "X-is-scientist-at-Y", this can helpus to better predict the Freebase employee(X,Y) relation.
To populate a database of such schema we present a family ofmatrix factorization models that predict affinity between databasetuples and relations. We show that this achieves substantially higheraccuracy than the traditional classification approach. More importantly,by operating simultaneously on relations observed in text and inpre-existing structured DBs, we are able to reason about unstructuredand structured data in mutually-supporting ways. By doing so ourapproach outperforms state-of-the-art distant supervision.
Sebastian Riedel is a lecturer at the University CollegeLondon since August 2012. He received his Dipl. Ing in Computer Scienceand Engineering from the Technical University Hamburg-Harburg (2003), aMSc in Informatics from the University of Edinburgh (2004), and a PhD inInformatics from the same organisation (2009). Before coming to UCL hehas worked as postdoc and research scientist at the University of Tokyo,and together with Andrew McCallum at UMass Amherst. Sebastian'sresearch interests are natural language processing and machine learning.He is particularly interested in automatic knowledge baseconstruction systems that function by maximising a global mathematicalobjective, and the methodological questions such approach raises.