Grover, C. and A. Lascarides [2001] XML-based Data Preparation for Robust Deep Parsing, to appear in Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL/EACL 2001), pp252--259, Toulouse, France.
We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the `messiness' in real language data and improve parse performance.
@inproceedings{grover:lascarides:2001,
author = {Claire Grover and Alex Lascarides},
title = {{\sc xml}-based Data Preparation for Robust Deep Parsing},
year = {2001},
booktitle = {Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL/EACL 2001)},
pages = {252--259},
address = {Toulouse}
}