Mark McConville's homepage

31 August 2009 : Lifelong learning course

Last week Elaine Farrow and I taught a two-day course for the University of Edinburgh's Office for Lifelong Learning called Robots, Brains and Artificial Intelligence, intended to be an introduction to the science of Informatics. The course will be repeated in April 2010.

My presentation slides are here. Elaine's are here. There is also a course blog which I hope to update every couple of weeks or so.

02 April 2009 : INDIGO project

For the next four months I will be working with Colin Matheson and Amy Isard on the EU FP6 INDIGO project ("Interaction with Personality and Dialogue Enabled Robots"). We will be building a morphological component for the Greek grammar module, and building an improved user interface for the OpenCCG grammar development environment.

30 September 2008 : HealthAgents project

I am now working on the HealthAgents project with Mike Matthews and Bonnie Webber. We will be looking at using deep syntactic parsing to extract population information from abstracts of medical trials.

Update 7 January 2009: This semester I am helping Henry Thompson teach Foundations of Natural Language Processing, a third year undergraduate course.

09 June 2008 : COLING workshop paper accepted

A paper I wrote with Myroslava Dzikovska has been accepted for the Cross-Framework and Cross-Domain Parser Evaluation workshop at the 22nd International Conference on Computational Linguistics (COLING'08) in Manchester in August. The title is 'Deep' Grammatical Relations for Semantic Interpretation, and here is the abstract:
We evaluate five distinct systems of labelled grammatical dependencies against the kind of input we require for semantic interpretation, in particular for the deep semantic interpreter underlying a tutorial dialogue system. We conclude that no one system provides all the features that we require, although each such feature is contained within at least one of the competing systems.
Update 30 September 2008: the paper has been archived here.

23 March 2008 : LAW paper accepted

Another paper that Myrosia and I wrote about our adventures harvesting a wide-coverage verb lexicon from the FrameNet corpus, "Using Inheritance and Coreness Sets to Improve a Wide-Coverage Verb Lexicon Harvested from FrameNet", has been accepted for the 2nd Linguistic Annotation Workshop (LAW'08) in Marrakech in May. Here is the abstract:
We investigate two aspects of the annotation scheme underlying the FrameNet semantically annotated corpus - the inheritance relation on semantic types with its corresponding links between semantic roles of increasing granularity, and the specification of coreness sets of related semantic roles - against the background of our ongoing effort to harvest a lexicon of verb entries for deep parsing. We conclude that these aspects of the FrameNet annotation scheme do prove useful for reducing the complexity and ambiguity of verb entries, but need to be applied more systematically to make the lexicon usable in a practical parsing system.
Update 3 June 2008: the paper is available here.

12 February 2008 : LREC paper accepted

A paper that I wrote with Myrosia Dzikovska, "Evaluating Complement-Modifier Distinctions in a Semantically Annotated Corpus", has been accepted for the 6th Language Resources and Evaluation Conference (LREC'08) in Marrakech at the end of May. Here is the abstract:
We evaluate the extent to which the distinction between semantically core and non-core dependents as used in the FrameNet corpus corresponds to the traditional distinction between syntactic complements and modifiers of a verb, for the purposes of harvesting a wide-coverage verb lexicon from FrameNet for use in deep linguistic processing applications. We use the VerbNet verb database as our gold standard for making judgements about complement-hood, in conjunction with our own intuitions in cases where VerbNet is incomplete. We conclude that there is enough agreement between the two notions (0.85) to make practical the simple expedient of equating core PP dependents in FrameNet with PP complements in our lexicon. Doing so means that we lose around 13% of PP complements, whilst around 9% of the PP dependents left in the lexicon are not complements.
Update 17 July 2008: the paper and slides can be found here.

23 September 2007 : TFLEX repository

From 1 October, Myroslava Dzikovska, Carolyn Rosé and I will be starting work on the second phase of the TFLEX project. I have created a repository for the bits of software we wrote during the first phase of the project, which you can find here. Specifically, there is a tool for converting the FrameNet semantically annotated corpus into a verb lexicon for deep linguistic processing applications, and various tools for viewing FrameNet and VerbNet source files.

08 May 2007 : DLP paper accepted

A paper I co-wrote with Myroslava Dzikovska, "Extracting a Verb Lexicon For Deep Parsing from FrameNet", has been accepted for the ACL Workshop on Deep Linguistic Processing in Prague at the end of June. Here is the abstract:
We examine the feasibility of harvesting a wide-coverage lexicon of English verbs from the FrameNet semantically annotated corpus, intended for use in a practical natural language understanding system. We identify a range of constructions for which current annotation practice leads to problems in deriving appropriate lexical entries, for example interrogatives, passives and control, and discuss potential solutions.

20 out of 46 submissions were accepted, giving an acceptance rate of 44%. Incidentally, the paper was also accepted for the ACL Linguistic Annotation Workshop whose acceptance rate was 28/52 = 55%.

Update 1 July 2007: the full paper can be accessed here, and there is a poster too.

02 March 2007 : TFSG paper

The organisers of the 1st International Workshop on Typed Feature Structure Grammars (TFSG'06), held on 20 June 2006 in Aalborg, Denmark, invited me to contribute a paper on Inheritance-driven CCG to the published version of the proceedings, based on my notes for the ESSLLI course I co-taught in 2005.

The proceedings have now gone online. My paper is basically an expanded version of my EACL paper, and can also be found here. The hardcopy of the proceedings will be published by Museum Tusculanum later this year.

Here is the abstract:

Inheritance-driven CCG encapsulates a uniform approach to the elimination of redundancy in CCG lexicons, where grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based category description language. The resulting formalism is partially ‘model-theoretic’, in that the category notation is interpreted against an underlying set of tree-like typed feature structures. This extension of CCG subsumes a number of other proposed category notations devised to allow for the construction of more efficient lexicons.

Update 6/8/7: The book is now going to be published in the autumn by Peter Lang as "Typed Feature Structure Grammars" in the series European University Studies 21: Linguistics.

Update 1/6/9: The book has finally been published.

01 August 2006 : TFLEX project

Until August 2007 I'll be working on the TFLEX project. This is funded by the Office of Naval Research, and the Edinburgh team is managed by Myroslava Dzikovska and Johanna Moore. We are exploring ways to harvest wide-coverage verb lexicons from the FrameNet semantically annotated corpus, for use in practical dialogue systems like TRIPS.

Update 1/12/7: funding for TFLEX has been extended until October 2008.

03 July 2006 : Viva

My viva date has been set for Thursday 7 September. The external examiner will be Dan Flickinger, and the internal is Henry Thompson.

Update 7 September 2006: I successfully defended my thesis this morning. My examiners recommended "minor editorial corrections". The slides from my pre-viva talk are here.

Update 23 January 2007: the final, approved version of my dissertation is now available here. Here is the abstract:

This thesis proposes an extended version of the Combinatory Categorial Grammar (CCG) formalism, with the following features: (a) grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based constraint language; (b) CCG lexicons are, or at least can be, functions from forms to these lexical types. This formalism, which I refer to as 'inheritance-driven' CCG (I-CCG), is conceptualised as a partially model-theoretic system, involving a distinction between category descriptions and their underlying category models, with these two notions being related by logical satisfaction. I argue that the I-CCG formalism retains all the advantages of both the core CCG framework and proposed generalisations involving such things as multiset categories, unary modalities or typed feature structures. In addition, I-CCG: (a) provides non-redundant lexicons for human languages; (b) captures a range of well-known implicational word order universals in terms of an acquisition-based preference for shorter grammars.

12 January 2006 : EACL paper accepted

My paper on "Inheritance and the CCG Lexicon" has been accepted for this year's EACL, to be held in Trento at the start of April. Abstract is as follows:
I propose a uniform approach to the elimination of redundancy in CCG lexicons, where grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based category description language. The resulting formalism is partially 'constraint-based', in that the category notation is interpreted against an underlying set of tree-like feature structures. I argue that this version of CCG subsumes a number of other proposed category notations devised to allow for the construction of more efficient lexicons. The formalism retains desirable properties such as tractability and strong competence, and provides a way of approaching the problem of how to generalise CCG lexicons which have been automatically induced from treebanks.

52 papers out of 264 were accepted, giving an acceptance rate of 20%.

Update 5 April 2006: You can find the full paper here, and there are slides too.

08 August 2005 : ESSLLI course

Cem Bozsahin and I are co-teaching a course this week at the European Summer School for Logic, Language and Information (ESSLLI'05) in Edinburgh. The course is titled "Combinatory Categorial Grammar and Linguistic Diversity". The course homepage is here.

01 April 2005 : AMI project

For the foreseeable future, I'll be working on the Augmented Multiparty Interaction (AMI) project, where we'll be collecting, transcribing and annotating a multimodal corpus of business meetings.

© 2007 Mark McConville