me

SONGFANG HUANG


IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York, 10598, U.S.A.
Tel: +1 914 945 1436
Email: s.f.huang @ ed.ac.uk, shuang @ us.ibm.com



    

[ Home | Education | Experience | Publications | Honors/Awards | Professional Experiences | Softwares ]

SOFTWARES




1. hpylm - hierarchical Pitman-Yor process language model

Download

Sample Data

Note: the following sample data (amicorpus.train.data.gz, amicorpus.test.data.gz) are extracted from the AMI Meeting Corpus. All copyrights of the data are belonging to the AMI project. Please check the web page for licence issues. The vocabulary file (amicorpus.vocab) were extracted by the AMIASR team.

Examples

Citation

If you are using this program, you may want to put one of the following papers as a citation:


NEW: 2. pldlm - power law discounting language model (PLDLM) using hpylm:

Examples

  1. with strength parameters theta:
  2. [cwd@localhost]: hpylm -order 3 -vocab amicorpus.vocab -text amicorpus.train.data.gz -read-with-mincounts -ebdiscount -gt1min 1 -gt2min 1 -gt3min 1 -lm amicorpus.pldlm.arpa.gz -debug 1 -ppl amicorpus.test.data.gz -eb-use-theta
    writing 50002 1-grams
    writing 154116 2-grams
    writing 416426 3-grams
    
    [ebdiscount] PPL on file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs
    0 zeroprobs, logprob= -331250 ppl= 109.555 ppl1= 143.271
    
  3. without strength parameters theta:
  4. [cwd@localhost]: hpylm -order 3 -vocab amicorpus.vocab -text amicorpus.train.data.gz -read-with-mincounts -ebdiscount -gt1min 1 -gt2min 1 -gt3min 1 -lm amicorpus.pldlm.arpa.gz -debug 1 -ppl amicorpus.test.data.gz
    writing 50002 1-grams
    writing 154116 2-grams
    writing 416426 3-grams
    
    [ebdiscount] PPL on file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs
    0 zeroprobs, logprob= -331529 ppl= 109.988 ppl1= 143.869
    

Note

Citation

If you are using this program, you may want to put the following paper as a citation:


3. hdp - hierarchical Dirichlet process

Download

Sample Data

Note: the following sample data (amicorpus.train.data, amicorpus.test.data.gz) are extracted from the AMI Meeting Corpus. All copyrights of the data are belonging to the AMI project. Please check the web page for licence issues.

Examples

Citation

If you are using this program, you may want to put the following paper as a citation:


Licence

This software was implemented on top of the SRILM toolkit from SRI. All those codes from the SRILM are credited to SRI.

Acknowledges


[ Home | Education | Experience | Publications | Honors/Awards | Professional Experiences | Softwares ]