I am a PhD student at the Institute for Language, Cognition and Computation (ILCC) under the supervision of Prof. Bonnie Webber and Dr. Kees van Deemter. I am conducting studies in the field of Attribution, funded by the Scottish Informatics and Computer Science Alliance (SICSA).
This resource comprises around 10,000 attribution relations from the PDTB. Direct, indirect and mixed attributions of assertions, beliefs, facts and eventualities are annotated at both the inter and intra sentential levels. The annotation marks the source, cue, content and supplement elements of the attribution and some, features. To the best of our knowledge, this corpus represents the largest resource annotated for attribution relations available to date.
The project is ongoing. At present:
This is a project of the ILC Institute (Pisa, Italy) in collaboration with the School of Informatics of the University of Edinburgh and the Department of Computational Linguistics of Pavia University.
For a more detailed description of this corpus, please refer to:
The pilot corpus annotated for attribution relations comprises 50 articles drawn from Italian newspaper corpora (e.g. La Repubblica), selected in order to obtain a balanced subcorpus. The overall number of tokens is 37.000. Overall, 461 attribution relations are annotated.
These features are a modification of the features included in the annotation of attribution in the PDTB.
The tool adopted and tailored for the annotation was MMAX2. This was chosen after an in depth comparison of several available tools as it best supports the specific annotation requirements of attribution relations (e.g.the annotation of discontinuous and multiple text spans as a single markable; the possibility of establishing relations among two or more markables; the annotation of overlapping markables). The tool is available open-source at http://mmax2.net.(Christoph Mueller, Michael Strube (2006): Multi-Level Annotation of Linguistic Data with MMAX2. In: Sabine Braun, Kurt Kohn, Joybrato Mukherjee (Eds.): Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods. Frankfurt: Peter Lang, pp. 197-214. (English Corpus Linguistics, Vol.3 )).Pilot Corpus (.zip, 7.6 MB)
The download includes:
I am not aware of any copyright restrictions applying to the material. If you use this data in your research, please contact S.ParetiATsms.ed.ac.uk and cite:
Pareti, Silvia, and Prodanof, Irina: Annotating Attribution Relations: Towards an Italian Discourse Treebank, Proceedings of the Seventh conference on International Language Resources and Evaluation LREC10, European Language Resources Association (ELRA), Eds: Calzolari, Nicoletta, Choukri, Khalid, Maegaard, Bente, Mariani, Joseph, Odijk, Jan, Piperidis, Stelios, Rosner, Mike, and Tapias, Daniel, 2010.
Please let me know if you find problems with the data or with accessing the corpus.