Silvia Pareti

I am a PhD student at the Institute for Language, Cognition and Computation (ILCC) of the School of Informatics, University of Edinburgh, under the supervision of Prof. Bonnie Webber and Dr. Kees van Deemter. I am conducting studies in the field of Attribution, funded by the Scottish Informatics and Computer Science Alliance (SICSA).

Publications and Activities

Resources Developed

PARC project

Attribution Bibliography

Contacts

  • E-mail: S dot SURNAME @sms.ed.ac.uk
  • Address:
    3.38 Informatics Forum
    10, Crichton Street
    EH8 9AB Edinburgh, UK

View Silvia Pareti's profile on LinkedIn


Resources for Attribution


Corpora

This section contains a list of existing resources annotating attribution. Attribution generally not annotated independently but as part of another task (e.g. annotating discourse or opinions). Other efforts annotate only a pre-defined range of attributions (e.g. the attribution of direct reported speech). With your help, this list can be kept updated and complete. If you know of an existing resource that is not on the list or you want to see the resource you developed listed, please contact me.

News domain

  • PDTB Attribution Corpus (Pareti, 2012).

    This resource comprises around 10,000 attribution relations from the PDTB. Direct, indirect and mixed attributions of assertions, beliefs, facts and eventualities are annotated at both the inter and intra sentential levels. The annotation marks the source, cue, content and supplement elements of the attribution and some, features. To the best of our knowledge, this corpus represents the largest resource annotated for attribution relations available to date.

  • RST Discourse Treebank corpus (Carlson and Marcu, 2001).

    The RST Discourse Treebank consists of 385 documents drawn from the Penn Treebank, i.e. news articles from the Wall Street Journal (WSJ). A relation of attribution is established between a nucleus, i.e. the content, and its satellite, i.e. the source.

  • GraphBank (Wolf and Gibson, 2005).

    This resource adds attribution relations to discourse since it specifically deals with news language, 135 texts from the WSJ and Associated Press Newswire.

  • Penn Discourse TreeBank (PDTB) (Prasad et al., 2008).

    The collection of over 2,000 news articles from the WSJ contains annotation of attribution for each discourse connective and its arguments.

  • MPQA Opinion Corpus (Wiebe, 2002).

    The corpus consists of 692 documents from different U.S. and non-U.S. news sources (Wall Street Journal, American National Corpus,...). The annotation is limited to the intra-sentential level and distinguishes three elements: the text anchor, the source and the target and also marks some properties relative to the private state, e.g. intensity and polarity, but also the general attitude and its target.

  • NTCIR-7 corpus (Evans et al., 2007; Seki et al., 2008, 2010).

    A corpus including the annotation of opinion holders was developed for the NTCIR-6, 7 and 8 Multilingual Opinion Analysis Task (MOAT). It comprises news documents in English, Japanese and Simplified Chinese.

  • Sydney Morning Herald Corpus (O'Keefe et al., 2012).

    A corpus of 965 online news documents annotated with over 3,500 direct quotations and their speakers. (A second version of the corpus includes also indirect and mixed quotations.)

  • TimeBank (Pustejovsky et al., 2003).

    183 articles from the WSJ (132), NYT, AP and transcribed news reports annotated for events. Attribution overlaps with the events labeled as: REPORTING, PERCEPTION, I_STATE, I_ACTION. Subordinating links (SLINK provide the connection between the attribution-bearing event (e.g. said, reports) and the event(s) in its complement span.

Narrative domain

  • Columbia Quoted Speech Attribution Corpus (Elson and McKeown, 2010).

    The corpus comprises excerpts from 6 narrative works from the 19th and 20th century. Over 3,000 direct quotes and their speakers are annotated.

Languages other than English

  • Italian Attribution Corpus (ItAC), Italian (Pareti and Prodanof, 2010).

    See Sec. Resources for a detailed description.

  • German Political News Opinion Corpus, German (Li et al., 2012).

    Ongoing project, currently consisting of 108 documents annotated with the Source, Target, Text anchor and Auxiliary of an opinion, considered both in context-dependent and context-independent frames.

  • CorpusTCC and RHETALHO, Brazilian Portuguese (Pardo et al., 2004).

    Part of a project for building a discourse parser, the first one consists of 100 scientific texts (about 53,000 words) from the computer science domain, while the latter comprises 20 scientific texts and 20 texts from online newspapers.

  • ANNODIS, French (Afantenos et al., 2012).

    A resource consisting of 156 texts (news, Wikipedia, research and reports) and 687,000 tokens. The resource annotates discourse structures (rhetorical relations and multi-level structures). Attribution is among the rhetorical relations annotated, however, figures for the expert annotation of the texts suggest there are only 75 instances of attribution in the corpus.

  • GloboQuotes, Portuguese (Fernandes et al., 2011).

    The corpus is a collection of 10 news genra, 685 news (published in 2007-2008) from the globo.com portal.


Attribution bibliography

This section is mantained in an ongoing effort to collect all the relevant literature concerning attribution annotation and extraction. Attribution is included in projects from different areas of study, in particular Opinion analysis, Discourse and Reported speech, often with little or no contact between communities. Please, feel free to contribute. Suggestions are very welcome!

Corpora Collection and Annotation

  • Afantenos S. D., Asher N., Benamara F., Bras M., Fabre C., Ho-Dac L.-M., Le Draoulec A. Muller P., Pery-Woodley M.-P., Prevot L., Rebeyrolle J., Tanguy L., Vergez-Couret M., Vieu L. (2012). An empirical resource for discovering cognitive principles of discourse organization: the ANNODIS corpus. In Proceedings of LREC 2012, Istanbul, Turkey, July 2012.
  • Bergler, S., Doandes, M., Gerard, C., and Witte, R. (2004). Attributions. In Qu, Y., Shanahan, J., and Wiebe, J., editors, Exploring Attitude and Affect in Text: Theories and Applications, Technical Report SS-04-07, pages 16-19, Stanford, California, USA. AAAI Press. Papers from the 2004 AAAI Spring Symposium.
  • Carlson, L. and Marcu, D. (2001). Discourse tagging reference manual. Technical report isitr-545. Technical report, ISI, University of Southern California.
  • Elson, D. K. and McKeown, K. R. (2010). Automatic attribution of quoted speech in literary narrative. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10).
  • Evans, D. K., Ku, L.-W., Seki, Y., Chen, H.-H., and Kando, N. (2007). Opinion analysis across languages: An overview of and observations from the NTCIR6 opinion analysis pilot task. In Masulli, F., Mitra, S., and Pasi, G., editors, Applications of Fuzzy Sets Theory, volume 4578 of Lecture Notes in Computer Science, pages 456-463. Springer Berlin / Heidelberg.
  • Li, H., Cheng, X., Adson, K., Kirshboim, T., Xu, F. (2012). Annotating opinions in German political news. In Proceedings of the Eighth conference on International Language Resources and Evaluation LREC12, Istanbul, 2012.
  • O'Keefe, T, Pareti, S., Curran, J., Koprinska, I. and Honnibal, M. (2012). A sequence labelling approach to quote attribution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Jeju, Korea, 2012.
  • Pardo, T., das Gracas Volpe Nunes, M., and Rino, L. (2004). Dizer: An automatic discourse analyzer for Brazilian Portuguese. In Bazzan, A. and Labidi, S., editors, Advances in Artificial Intelligence - SBIA 2004, volume 3171 of Lecture Notes in Computer Science, pages 224-234. Springer Berlin / Heidelberg.
  • Pareti, S. (2012). A database of attribution relations. In Proceedings of the Eighth conference on International Language Resources and Evaluation LREC12, Istanbul, 2012.
  • Pareti, S. (2012). The independent encoding of Attribution Relations. In Proceedings of the Eighth Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-8), Pisa, October 3-5.
  • Pareti, S. (2009). Towards a discourse resource for Italian: Developing an annotation schema for attribution. Master's thesis, Universita' degli Studi di Pavia.
  • Pareti, S. and Prodanof, I. (2010). Annotating attribution relations: Towards an Italian discourse treebank. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., and Tapias, D., editors, Proceedings of the Seventh conference on International Language Resources and Evaluation LREC10. European Language Resources Association(ELRA).
  • Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A., Robaldo, L., and Webber, B. (2008). The Penn Discourse Treebank 2.0 annotation manual. Technical report, University of Pennsylvania: Institute for Research in Cognitive Science.
  • Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., Lazo, M. (2003). The Timebank corpus. In Corpus linguistics (Vol. 2003, p. 40).
  • Seki, Y., Evans, D. K., Ku, L.-W., Sun, L., Chen, H.-H., and Kando, N. (2008). Overview of multilingual opinion analysis task at NTCIR-7. In Proceedings of NTCIR-7 Workshop Meeting on Evaluation of Information Access Technologies, pages 185-203.
  • Seki, Y., Ku, L.-W., and Sun, L. (2010). Overview of multilingual opinion analysis task at NTCIR- 8. In Proceedings of NTCIR-8 Workshop Meeting on Evaluation of Information Access Technologies, pages 209-220.
  • Wiebe, J. (2002). Instructions for annotating opinions in newspaper articles. Technical report, University of Pittsburgh.
  • Wiebe, J.,Wilson, T., and Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39:165-210.
  • Wilson, T. and Wiebe, J. (2005). Annotating attributions and private states. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, CorpusAnno05, pages 53-60, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Wolf, F. and Gibson, E. (2005). Representing discourse coherence: A corpus-based study. Comput. Linguist., 31:249-288.

Reported Speech and Speaker Extraction and Analysis

  • Almeida, M., Almeida, M. B.,Martins, A. F. T. (2014). A Joint Model for Quotation Attribution and Coreference Resolution. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, Gothenburg, Sweden.
  • Alrahabi, M., Descles, J.-P., (2008). Automatic annotation of direct reported speech in Arabic and French, according to a semantic map of enunciative modalities. In Proceedings of the 6th international conference on Advances in Natural Language Processing, GoTAL'08.
  • Alrahabi, M., Descles, J.-P., and Suh, J. (2010). Direct reported speech in multilingual texts: Automatic annotation and semantic categorization. In Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS10).
  • Bergler, S. (1992). The evidential analysis of reported speech. PhD dissertation, Brandeis University, Massachusetts. Available from UMI.
  • de La Clergerie, E., Sagot, B., Stern, R., Denis, P., Recource, G., and Mignot, V. (2009). Extracting and visualizing quotations from news wires. In Proceedings of L&TC 2009, Poznan, Poland.
  • Doandes, M. (2003). Profiling for belief acquisition from reported speech. Master's thesis, Concordia University, Montreal, Quebec, Canada.
  • Elson, D. K. and McKeown, K. R. (2010). Automatic attribution of quoted speech in literary narrative. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10).
  • Fernandes, W.P.D., Motta, E., Milidiu, R.L.(2011). Quotation extraction for portuguese. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiaba, pp. 204-208.
  • Glass, K. and Bangay, S. (2007). A naive, saliencebased method for speaker identification in fiction books. In Proceedings of the 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 07), pages 1-6.
  • O'Keefe, T, Pareti, S., Curran, J., Koprinska, I. and Honnibal, M. (2012). A sequence labelling approach to quote attribution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Jeju, Korea.
  • He, H., Barbosa, D., Kondrak, G. (2013). Identification of speakers in novels. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria.
  • Krestel, R., Bergler, S. and Witte, R. (2012). Modeling human newspaper readers: The fuzzy believer approach. Natural Language Engineering 18(5): 1-18, Cambridge University Press.
  • Krestel, R., Bergler, S., and Witte, R. (2008). Minding the source: Automatic tagging of reported speech in newspaper articles. In Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA).
  • Krestel, R.,Witte, R., and Bergler, S. (2007a). Creating a fuzzy believer to model human newspaper readers. In Kobti, Z. and Wu, D., editors, Advances in Artificial Intelligence, volume 4509 of Lecture Notes in Computer Science, pages 489-501. Springer Berlin / Heidelberg.
  • Krestel, R., Witte, R., and Bergler, S. (2007b). Processing of beliefs extracted from reported speech in newspaper articles. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2007).
  • Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D. (2011). Stanford multi-pass seive coreference resolution system at the CoNLL-2011 shared task. In Proceedings of the 5th Conference on Computational Natural Language Learning: Shared Task (CONLL Shared Task '11), Association for Computational Linguistics, Stoudsburg, PA, USA: 28-34.
  • Liang, J., Dhillon, N., and Koperski, K. (2010). A large-scale system for annotating and querying quotations in news feeds. In Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH '10, pages 7:1-7:5, New York, NY, USA. ACM.
  • Mamede, N., Chaleira, P. (2004). Character identification in children stories. Advances in Natuaral Language Processing: 82-90.
  • Pareti, S., O'Keefe, T., Konstas, I., Curran, J. R. and Koprinska, I., (Forthcoming, 2013). Automatically Detecting and Attributing Indirect Quotations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Seattle, U.S.
  • Pouliquen, B., Steinberger, R., and Best, C. (2007). Automatic detection of quotations in multilingual news. In Proceedings of the International Conference Recent Advances In Natural Language Processing (RANLP 2007), pages 487-492.
  • Ruppenhofer, J., Sporleder, C., and Shirokov, F. (2010). Speaker attribution in cabinet protocols. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., and Tapias, D., editors, Proceedings of the Seventh conference on International Language Resources and Evaluation LREC10. European Language Resources Association (ELRA).
  • Sarmento, L. and Nunes, S. (2009). Automatic extraction of quotes and topics from news feeds. In Proceedings of DSIE'09 - 4th Doctoral Symposium on Informatics Engineering.
  • Schneider, N., Hwa, R., Gianfortoni, P., Das, D., Heilman, M., Black, A.W., Crabbe, F. L., and Smith, N. A. (2010). Visualizing topical quotations over time to understand news discourse. Technical Report T.R. CMU-LTI-10-013, Carnegie Mellon University, Pittsburgh, PA.
  • Weiser, S. and Watrin, P. (2012). Extraction of unmarked quotations in Newspapers. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC12).
  • Zhang, J., Black, A., and Sproat, R. (2003). Identifying speakers in children's stories for speech synthesis. In Proceedings of EUROSPEECH 2003.

Opinions and Opinion Holder Extraction and Analysis

  • Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., and Jurafsky, D. (2004). Automatic extraction of opinion propositions and their holders. In 2004 AAAI Spring Symposium on Exploring Attitude and Affect in Text, pages 22-24.
  • Bloom, K., Stein, S., and Argamon, S. (2007). Appraisal extraction for news opinion analysis at ntcir-6. In Proceedings of NTCIR-6 Workshop Meeting.
  • Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S. (2005). Identifying sources of opinions with conditional random fields and extraction patterns. In HLT05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 355-362, Morristown, NJ, USA. Association for Computational Linguistics.
  • Das, D. and Bandyopadhyay, S. (2013). Emotion co-referencing-Emotional expression, holder and topic. In Computational Linguistics and Chinese Language Processing, Vol.18, No.1, pp. 78-79.
  • Das, D. and Bandyopadhyay, S. (2010). Emotion holder for emotional verbs: The role of subject and syntax. In Gelbukh, A., editor, Computational Linguistics and Intelligent Text Processing, volume 6008 of Lecture Notes in Computer Science, pages 385-393. Springer Berlin / Heidelberg.
  • Gui, L., Xu, R., Xu, J, Liu, C. (2013). A cross-lingual approach for opinion holder extraction. In Journal of Computaional Information Systems 9:6, pp. 2193-2200.
  • Jung, H.-Y., Kim, J., and Lee, J.-H. (2010). Opinion analysis for NTCIR8 at Postech. In Proceedings of NTCIR-8 Workshop Meeting on Evaluation of Information Access Technologie.
  • Kim, S.-M. and Hovy, E. (2005). Identifying opinion holders for question answering in opinion texts. In Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains, Pennsylvania, US.
  • Kim, S.-M. and Hovy, E. (2006a). Extracting opinions, opinion holders, and topics expressed in online news media text. In SST06: Proceedings of the Workshop on Sentiment and Subjectivity in Text, pages 1-8, Morristown, NJ, USA. Association for Computational Linguistics.
  • Kim, S.-M. and Hovy, E. (2006b). Identifying and analyzing judgment opinions. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 200-207, Morristown, NJ, USA. Association for Computational Linguistics.
  • Kim, Y., Jung, Y., and Myaeng, S.-H. (2007). Identifying opinion holders in opinion text from online newspapers. In Proceedings of the 2007 IEEE International Conference on Granular Computing, GRC07, pages 699-702, Washington, DC, USA. IEEE Computer Society.
  • Lu, B. (2010). Identifying opinion holders and targets with dependency parser in chinese news texts. In HLT10: Proceedings of the NAACL HLT 2010 Student Research Workshop, pages 46-51, Morristown, NJ, USA. Association for Computational Linguistics.
  • Wiegand, M. and Klakow, D. (2010). Convolution kernels for opinion holder extraction. In HLT10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 795-803, Morristown, NJ, USA. Association for Computational Linguistics.

Other attribution extraction studies

  • Brunner, A., (2013). Automatic recognition of speech, thought, and writing representation in German narrative texts. In Literary and Linguistic Computing, Oxford University Press, 18 May.
  • Lin, Z., Ng, H.T., Kan, M.Y. (2010). A PDTB-styled end-to-end discourse parser. Technical report TRB88/10, School of Computing, National University of Singapore, August.
  • Skadhauge, P.R., Hardt, D. (2005). Syntactic identification of attribution in the RST Treebank. In Proceedings of the 2nd International Joint Conference on NLP, Jeju island, Korea, 11-13 October.
  • Wiegand, M., (2013). Predicate acquisition for opinion holder extraction: A data-intensive approach. In Proceedings of the 8th HiER Workshop, Hildesheim, 25-26 April, pp.41-50.
[Last updated: 02/05/2014]