Data Intensive Semantics and Pragmatics (DISP)

I was a PI on this ESRC-funded project from 1998--2001. Following on from work I did in the early 1990s with Ann Copestake on the symbolic representation of lexical information, I was involved in some work on acquiring lexical semantic information from large corpora, using unsupervised machine learning. This work was done in collaboration with Claire Grover and Mirella Lapata.

We focus on interpretation tasks where the semantic information is implicit. For example, we have modelled the acquisition of semantic relations in compound nouns with deverbal heads in the medical domain and over the BNC. This involves predicting that in "patient arrival" the patient is the subject of the arriving event, whereas in "hospital arrival" the destination of the arriving event is the hospital. We achieve this by unsupervised learning through exploiting meaning paraphrases in the corpus and surface syntactic cues (i.e., we estimate that "patient" is more likely to be the subject of "arrive" on the basis of sentences in the corpus that feature the verb "arrive"). We have demonstrated that these techniques are also useful for interpreting logical metonymies: e.g., estimating that "enjoy the book" means "enjoy reading the book" and "good soup" is a soup that tastes good.

More information and access to the software tools that we used is available here.