The UCCA Resource Webpage

Universal Conceptual Cognitive Annotation (UCCA) is a novel semantic approach to grammatical representation. It was developed in the Computational Linguistics Lab of the Computer Science Department of the Hebrew University by Omri Abend and Ari Rappoport.

The central idea of the project is to analyze and annotate natural languages using purely semantic categories and structure (a graph). Syntactic categories and structure are automatically deduced from the semantic ones using learning algorithms. The basic set of semantic catagories (the foundational layer) is inspired by work in linguistic typology, cognitive grammar, and neuroscience.

We have annotated 160K tokens from English Wikipedia with the UCCA scheme. The annotation so far focused on argument-structure and linkage phenomena. Due to the complexity of the linguistic system, there are often many applicable annotations for a given text (cf. A Dynamic Usage-based Model by R.W. Langacker (2000)). For practical reasons, we select a small set of highly useful distinctions, and apply them to provide one plausible annotation.

This page contains links to all of UCCA's downloadable resources. If you use these resources in your research, please cite this paper:

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013
[Paper: pdf]


Web Application

The graphic web-application for annotating and viewing texts with UCCA annotation can be found here (avoid registration by loging in as: "guest" with password "tseug"). Tomer Eshet partnered in the development of the web application.


Corpora

The following are the files constituting release 1.0 of the UCCA corpus. The distribution contains about 160K token annotated with UCCA's foundational layer. The corpus is taken from the English Wikipedia and is released under the Creative Commons Attribution-ShareAlike 3.0 Unported license.

This is an English-French parallel corpus of about 25K tokens based on the first five chapters of "Twenty Thousand Leagues Under the Sea" by Jules Verne. It is also released under the Creative Commons Attribution-ShareAlike 3.0 Unported license.


Source Code

The following is python source code for reading and manipulating the UCCA structures. The code was written by Amit Beka and is released under the GNU Public License version 3.0 or later (license included in the bundle).

Publications

Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study
Elior Sulem, Omri Abend and Ari Rappoport,
ACL 2015 Workshop on Semantics-Driven Statistical Machine Translation (S2MT).
[Paper: pdf]

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport, ACL 2013 (full paper)
[Paper: pdf]

UCCA: A Semantics-based Grammatical Annotation Scheme
Omri Abend and Ari Rappoport, IWCS 2013 (full paper)
[Paper: pdf]


Theses

Measuring Semantic Preservation in Machine Translation with HCOMET: Human Cognitive Metric for Evaluating Translation
Pedro Marinotti, MSc Thesis,
The University of Edinburgh, 2014
[Paper: pdf]

Integration of a cognitive annotation into machine translation: Theoretical foundations and bilingual corpus analysis
Elior Sulem, MSc Thesis,
The Hebrew University of Jerusalem, 2014
[Paper: pdf]

Semi-supervised identification of scene-evoking nouns in UCCA
Amit Beka, MSc Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Grammatical Annotation Founded on Semantics: A Cognitive Linguistics Approach to Grammatical Corpus Annotation
Omri Abend, PhD Thesis,
The Hebrew University of Jerusalem, 2013


Contact

For any questions or feedback, please email Omri Abend at oabend@inf.ed.ac.uk.