### Human coherence judgements The file humans.avg contains average human ratings on the coherence of DUC summaries. The collection of the data is detailed in the CL manuscript in Section 5 and also in the following paper (Section 4.2): Regina Barzilay and Mirella Lapata. 2005. Modeling Local Coherence: An Entity-Based Approach In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 141-148. Ann Arbor. The file reads as follows: - the first column describes the DUC document cluster from which the summary was generated - the second column describes the system or human that generated the summary - the third column is the average rating of the coherence of the summary So, for example, the following line: D30003 16 5.07142857142857 means that system 16 generated a summary for cluster D30003 whose rating was 5.07142857142857. In our study we used 5 summarization systems: 13, 16, 18, 26, and 6. We also used human written summaries (by DUC assessors). The latter are denoted with the letter H. The summaries were taken from systems that participated in DUC 2003. You should register with DUC to obtain access to the original summaries. ### LSA scores for coherence experiments data1-test-lsa.tar.gz: LSA scores for test data for ordering experiments (Earthquakes domain, Experiment 1) data2-test-lsa.tar.gz: LSA scores for test data for ordering experiments (Accidents domin Experiment 1) lsa_summaries_test: LSA scores for test data for summarisation experiments ### Readability experiments The data for the readability experiments is in the readability/ directory. We used 5-fold cross validation. The file names mean the following: svm_perplex[1-5]_grid_ana.data: training files for folds 1-5, for model with coreference svm_perplex[1-5]_grid_ana.test: training files for folds 1-5 for model with coreference svm_perplex[1-5]_grid_ana.data: training files for folds 1-5, for model with coreference svm_perplex[1-5]_grid_ana.test: test files for folds 1-5 for model with coreference svm_perplex[1-5]_grid-no_ana.data: training files for folds 1-5, for model with coreference svm_perplex[1-5]_grid-no_ana.test: test files for folds 1-5, for model without coreference lsa[1-5].test: LSA scores for folds 1-5 (test data).