Silvia Pareti

I am a PhD student at the Institute for Language, Cognition and Computation (ILCC) of the School of Informatics, University of Edinburgh, under the supervision of Prof. Bonnie Webber and Dr. Kees van Deemter. I am conducting studies in the field of Attribution, funded by the Scottish Informatics and Computer Science Alliance (SICSA).

Publications and Activities

Resources Developed

PARC project

Attribution Bibliography

Contacts

  • E-mail: S dot SURNAME @sms.ed.ac.uk
  • Address:
    3.38 Informatics Forum
    10, Crichton Street
    EH8 9AB Edinburgh, UK

View Silvia Pareti's profile on LinkedIn


Resources Developed


Penn Attribution Relations Corpus 2.0

This resource comprises around 10,000 attribution relations from the PDTB. Direct, indirect and mixed attributions of assertions, beliefs, facts and eventualities are annotated at both the inter and intra sentential levels. The annotation marks the source, cue, content and supplement elements of the attribution and some features. To the best of our knowledge, this corpus represents the largest resource annotated for attribution relations available to date.

The project is ongoing. For this version:

  • an independent layer of annotation for attribution relations has been derived from the PDTB
  • the annotation has been further extended
  • an inter-annotator agreement study was conducted to test the annotation schema
The full description of the corpus, annotation schema and inter-annotator agreement study are reported in (Pareti, 2012).

For information and requests, please contact me.

Italian Attribution Corpus (ItAC)

This is a project of the ILC Institute (Pisa, Italy) in collaboration with the School of Informatics of the University of Edinburgh and the Department of Computational Linguistics of Pavia University.

Contents

Description
Annotation Schema
Tool
Download
Terms of Use

Description

For a more detailed description of this corpus, please refer to:

  • (Pareti and Prodanof, 2010) - overview
  • (Pareti, 2009) thesis - annotation schema and guidelines (chapter 6)
under publications.

The pilot corpus annotated for attribution relations comprises 50 articles drawn from Italian newspaper corpora (e.g. La Repubblica), selected in order to obtain a balanced subcorpus. The overall number of tokens is 37.000. Overall, 461 attribution relations are annotated.

Annotation Schema

These features are a modification of the features included in the annotation of attribution in the PDTB.

TAGS ATTRIBUTES
attribution_role content, cue, source, supplement
type assertion, belief, fact, eventuality
source writer, other, arbitrary, mixed
factuality factual, non-factual
scopal_change none, scopal-change
relation set_n

Tool

The tool adopted and tailored for the annotation was MMAX2. This was chosen after an in depth comparison of several available tools as it best supports the specific annotation requirements of attribution relations (e.g.the annotation of discontinuous and multiple text spans as a single markable; the possibility of establishing relations among two or more markables; the annotation of overlapping markables). The tool is available open-source at http://mmax2.net.(Christoph Mueller, Michael Strube (2006): Multi-Level Annotation of Linguistic Data with MMAX2. In: Sabine Braun, Kurt Kohn, Joybrato Mukherjee (Eds.): Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods. Frankfurt: Peter Lang, pp. 197-214. (English Corpus Linguistics, Vol.3 )).

Download

Pilot Corpus (.zip, 7.6 MB)
The download includes:
  • customized MMAX2 (2_1.1)
  • Pilot Corpus (.txt and .xml)
  • Documentation
I recommend using the version of MMAX2 provided with the Download.

Terms of Use

I am not aware of any copyright restrictions applying to the material. If you use this data in your research, please contact S.ParetiATsms.ed.ac.uk and cite:

Pareti, Silvia, and Prodanof, Irina: Annotating Attribution Relations: Towards an Italian Discourse Treebank, Proceedings of the Seventh conference on International Language Resources and Evaluation LREC10, European Language Resources Association (ELRA), Eds: Calzolari, Nicoletta, Choukri, Khalid, Maegaard, Bente, Mariani, Joseph, Odijk, Jan, Piperidis, Stelios, Rosner, Mike, and Tapias, Daniel, 2010.

Please let me know if you find problems with the data or with accessing the corpus.