Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure [pdf]

Sporleder, C. and A. Lascarides [2004] Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure, Proceedings of Coling 2004, pp.43--49.

We propose a novel method to predict the inter-paragraph discourse structure of text, i.e., to infer which paragraphs are related to each other and form larger segments on a higher level. Our method c ombines a clustering algorithm wiht a model of segment ``relatedness'' acquired in a machine learning step. The model integrates information form a variety of sources, such as word co-occurrence, lexical chains, cue phrases, punctuation and tense. Our method outperforms an approach that relies on word co-occurrence alone.


@inproceedings{sporleder:lascarides:2004,
author = {Caroline Sporleder and Alex Lascarides},
year = {2004},
title = {Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure},
booktitle = {Proceedings of the International Conference in Computational Linguistics (COLING)},
pages = {43--49}
}