EasyCCG

EasyCCG is a state-of-the-art open-source CCG parser, designed to help advance research in semantics and parsing.

EasyCCG has the following advantages:
  • High performance - excellent results on Wikipedia and Biomedical text with no domain adaptation
  • Very fast - around 150 sentences per second
  • Robust to new words and noisy sentences
  • Clean CCG grammar, with small set of semantically interpretable combinators and unary rules
  • n-best parsing from an unpruned search space
  • Minimal training data requirements - the model can be trained simply from annotated words, not full sentences or parse trees.
  • User friendly - extremely simple model with documented source code. 

If you use EasyCCG in your research, please cite the following paper: A* CCG Parsing with a Supertag-factored Model Mike Lewis and Mark Steedman, EMNLP 2014

Basic usage:
    java -jar easyccg.jar --model model

For N-best parsing:
    java -jar easyccg.jar --model model --nbest 10

To parse questions, use:
    java -jar easyccg.jar --model model_questions -s -r S[q] S[qem] S[wq]

If you want POS/NER tags in the output, you'll need to supply them in the input, using the format word|POS|NER. To get this format from the C&C tools, use the following:
    echo "parse me" | candc/bin/pos -model candc_models/pos | candc/bin/ner -model candc_models/ner -ofmt "%w|%p|%n \n" | java -jar easyccg.jar --model model -i POSandNERtagged -o extended

To get Boxer-compatible Prolog output, use:
    echo "parse me" | candc/bin/pos -model candc_models/pos | candc/bin/ner -model candc_models/ner -ofmt "%w|%p|%n \n" | java -jar easyccg.jar --model model -i POSandNERtagged -o prolog -r S[dcl]