![]() |
SOFTWARES
Note: the following sample data (amicorpus.train.data.gz, amicorpus.test.data.gz) are extracted from the AMI Meeting Corpus. All copyrights of the data are belonging to the AMI project. Please check the web page for licence issues. The vocabulary file (amicorpus.vocab) were extracted by the AMIASR team.
[cwd@localhost]: hpylm -help
[cwd@localhost]: runHPYLM -h
[cwd@localhost]: hpylm -debug 0 -order 3 -vocab amicorpus.vocab -pydiscount -numiter 100 -numsamp 10 -numgap 5 -text amicorpus.train.data.gz -lm amicorpus.hpylm.arpa.gz -write amicorpus.hpymodel -ppl amicorpus.test.data.gz writing 50002 1-grams writing 154116 2-grams writing 416426 3-grams PPL on file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -328749 ppl= 105.737 ppl1= 137.998 [cwd@localhost]: ngram -lm amicorpus.hpylm.arpa.gz -ppl amicorpus.test.data.gz file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -328749 ppl= 105.737 ppl1= 137.998 [cwd@localhost]: ngram-count -order 3 -vocab amicorpus.vocab -ukndiscount -interpolate -text amicorpus.train.data.gz -lm amicorpus.iknlm.arpa.gz -gt3min 1 writing 50002 1-grams writing 154116 2-grams writing 416426 3-grams [cwd@localhost]: ngram -lm amicorpus.iknlm.arpa.gz -ppl amicorpus.test.data.gz file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -333752 ppl= 113.511 ppl1= 148.745 [cwd@localhost]: ngram-count -order 3 -vocab amicorpus.vocab -kndiscount -interpolate -text amicorpus.train.data.gz -lm amicorpus.mknlm.arpa.gz -gt3min 1 writing 50002 1-grams writing 154116 2-grams writing 416426 3-grams [cwd@localhost]: ngram -lm amicorpus.mknlm.arpa.gz -ppl amicorpus.test.data.gz file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -331990 ppl= 110.709 ppl1= 144.867
[cwd@localhost]: runHPYLM --workdir . --taskid single --prog hpylm --mode 0 --vocab amicorpus.vocab --text amicorpus.train.data.gz --ppl amicorpus.test.data.gz --niters 100 --nsamps 10 --lm amicorpus.hpylm.single.arpa.gz --hpy amicorpus.hpymodel.single --debug 0 writing 50002 1-grams writing 154116 2-grams writing 416426 3-grams PPLpit on file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -328752 ppl= 105.742 ppl1= 138.004 [cwd@localhost]: ngram -lm ./single/models/amicorpus.hpylm.single.arpa.gz -ppl amicorpus.test.data.gz file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -328752 ppl= 105.742 ppl1= 138.004
[cwd@localhost]: ngram-count -text amicorpus.train.data.gz -gt3min 1 -write amicorpus.train.count.n123 [cwd@localhost]: runHPYLM --workdir . --taskid sequential --prog hpylm --mode 1 --vocab amicorpus.vocab --count amicorpus.train.count.n123 --ppl amicorpus.test.data.gz --nparts 4 --nprocs 4 --niters 100 --lm amicorpus.hpylm.sequential.arpa.gz --hpy amicorpus.hpymodel.sequential --debug 0
[cwd@localhost]: runHPYLM --workdir . --taskid parallel --prog hpylm --mode 2 --vocab amicorpus.vocab --count amicorpus.train.count.n123 --nparts 8 --ppl amicorpus.test.data.gz --niters 100 --lm amicorpus.hpylm.parallel.arpa.gz --hpy amicorpus.hpymodel.parallel --debug 0
If you are using this program, you may want to put one of the following papers as a citation:
[cwd@localhost]: hpylm -order 3 -vocab amicorpus.vocab -text amicorpus.train.data.gz -read-with-mincounts -ebdiscount -gt1min 1 -gt2min 1 -gt3min 1 -lm amicorpus.pldlm.arpa.gz -debug 1 -ppl amicorpus.test.data.gz -eb-use-theta writing 50002 1-grams writing 154116 2-grams writing 416426 3-grams [ebdiscount] PPL on file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -331250 ppl= 109.555 ppl1= 143.271
[cwd@localhost]: hpylm -order 3 -vocab amicorpus.vocab -text amicorpus.train.data.gz -read-with-mincounts -ebdiscount -gt1min 1 -gt2min 1 -gt3min 1 -lm amicorpus.pldlm.arpa.gz -debug 1 -ppl amicorpus.test.data.gz writing 50002 1-grams writing 154116 2-grams writing 416426 3-grams [ebdiscount] PPL on file amicorpus.test.data.gz: 8777 sentences, 154831 words, 1201 OOVs 0 zeroprobs, logprob= -331529 ppl= 109.988 ppl1= 143.869
If you are using this program, you may want to put the following paper as a citation:
Note: the following sample data (amicorpus.train.data, amicorpus.test.data.gz) are extracted from the AMI Meeting Corpus. All copyrights of the data are belonging to the AMI project. Please check the web page for licence issues.
[cwd@localhost]: hdp -help
[Command]: hdp -help
Usage of command "options"
-help-hdp: print help information for HDP options
-rand: the random seed
Default value: 0
-lda: recover lda in HDP
-numtopic: initial number of topics in HDP
Default value: 1
-numiter: number of iterations to burn in HDP
Default value: 500
-numsamp: number of samples to collect in HDP
Default value: 10
-numspace: number of space samples in HDP
Default value: 5
-train: train file name for HDP
-test: test file name for HDP
-config: test file name for HDP
-aa: Dirichlet prior for HH
Default value: 0.05
-prior-count: use the prior count instead of aa
-beta: beta for adaptive LM
Default value: 0.5
-read-hdp: write the HDP model to file
-write-hdp: write the HDP model to file
-test-hdp: evaluate HDP in ppl
-infer-role: infer roles using the HDP
-use-lm-vocab: use the same vocab with baseline ngram
-hdp-vocab: vocab file for the HDP model
-norm: use a normalized LM for adaptation
-prior-count-file: text file for prior counts for H in HDP
-ngram-count-file: text file for prior counts for ungram to adapt from
-top: number of top classes to adapt LM
Default value: 0
-write-likelihood: write the likelihoods for training and testing
-write-hdp-vocab: write hdp vocab to file
-use-adapt: adapting LM in HDP
-use-mix: mixing LM in HDP
-role-ngram: estimate a role ngram for each of the four roles
the default action is to do nothing
-help: Print this message
[cwd@localhost]: zcat amicorpus.train.doc.gz | headPM KICKOFF MEETING PROJECT TWENTY MINUTES KIND SURE LAURA PROJECT MANAGER INTRODUCE PM DESIGNING REMOTE CONTROL RECORD ACTUALLY DAVID ANDREW CRAIG ARRIVED DESIGN REMOTE CONTROL SUPPOSED ORIGINAL TRENDY USER FRIENDLY KIND STAGES DESIGN SURE GUYS RECEIVED EMAILS ID PROJECT ANNOUNCEMENT PROJECT DESIGNING REMOTE CONTROL PM INDIVIDUAL MEETING REPEAT PROCESS TIMES POINT WHITEBOARD DRAW FAVOURITE ANIMAL SUM FAVOURITE CHARACTERISTICS if no role information available, just useSENTENCE
[cwd@localhost]: zcat config.role ROOT BASE PM ROOT ID ROOT UI ROOT ME ROOT for default case, run hdp without -config argument.
[cwd@localhost]: hdp -numtopic 20 -numiter 200 -train amicorpus.train.doc.gz -write-hdp amicorpus.2level.hdpmodel.gz -test amicorpus.test.doc.gz -aa 0.5 -debug 1 2>&1 | tee log
[cwd@localhost]: hdp -numtopic 20 -numiter 200 -train amicorpus.train.doc.gz -write-hdp amicorpus.3level.hdpmodel.gz -test amicorpus.test.doc.gz -aa 0.5 -config config.role -debug 1 2>&1 | tee log
If you are using this program, you may want to put the following paper as a citation:
This software was implemented on top of the SRILM toolkit from SRI. All those codes from the SRILM are credited to SRI.