Trevor Cohn, Chris Callison-Burch and Mirella Lapata. 2008. Constructing Corpora for Development and Evaluation of Paraphrase Systems 2008. Computational Linguistics, 34(4), 597-614.

Automatic paraphrasing is an important component in many natural language processing tasks. In this paper we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word-alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.


@Article{Cohn:ea:08,
  author = 	 {Trevor Cohn and Chris Callison-Burch and Mirella Lapata},
  title = 	 {Constructing Corpora for Development and Evaluation of Paraphrase Systems},
  journal = 	 {Computational Lingustics},
  year = 	 2008, 
  volume =       {34},
  number =       {4}, 
  pages =        {597--614}
}