University of Edinburgh
School of Informatics
10 Crichton Street
Edinburgh, EH8 9AB, UK
Center for Intelligent Information Retrieval
Computer Science Department
University of Massachusetts Amherst
140 Governors Drive
Amherst, MA 01003-9264
phone: +44-131-650-4418 (office)
Semantically informed information retrieval
I am working with Bruce Croft, Victor Lavrenko and Jon Oberlander on applying syntactic and semantic language processing to improve open domain, ad hoc information retrieval performance for natural language queries. I have demonstrated significantly improved performance over state-of-the-art search techniques using graph-based methods to rank and select syntactic/semantic word dependencies. Selected terms are applied in linear feature model using a combination language modelling/inference network system (Indri). Retrieval effectiveness is as good, or better, than the best published results using complex optimisation methods for descriptive queries, but uses only one well-chosen phrase in addition to a unigram query representation. The `killer phrases' are identified using semantic relations from a dependency parse, supplemented with distributional constraints from a local affinity graph. The improvements in retrieval performance appear to be linked to shallow capture of natural inference about long-distance semantic word relations. My PhD is scheduled for completion in 2013.
Since May 2011, I work out of the Centre for Intelligent Information Retrieval (CIIR) at the University of Massachusetts, Amherst, with Bruce Croft. I was also a visiting researcher at the CIIR in summer 2010, and researched linguistic event extraction at the Nara Institute of Science and Technology (NAIST) in summer 2009. Early PhD work was in legal retrieval, working with Burkhard Schafer in the Edinburgh School of Law.
Notice on NTCIR-6 patent retrieval corpora: My research on legal retrieval using the NTCIR patent dataset highlighted that of ~33,000 patents identified as relevant to 1000 sample queries, 1,396 patents are missing from the source collection. This error has been confirmed by the NTCIR organisers, and the list of missing documents can be found here.
Language engineering and classification
My dissertation, supervised by Jon Oberlander, explores clustering song genres using lyrics. Text processing and sentiment analysis techniques were used to extract 140 language features for analysis using Kohonen self-organising maps (SOMs). These maps were evaluated against the clustering of eight hand-selected song pairs. Other projects included: predicting web queries using query log language models, and building a named entity recognition system using a maximum entropy classifier.