This class is a graduate seminar for students who are interested in expanding their knowledge about the use of Bayesian statistics in natural language processing.
The class consists of two parts: the first part consists of lectures given by the instructor about various topics in the area of Bayesian NLP. The second part consists of discussions, led by participants in the class (each week by different participants) about papers in the area of Bayesian NLP.
Grades will be based on the discussions led in class, participation and possibly a white paper (a short paper that students have to submit, in which participants summarize one or two related papers, and suggest some future directions for exploration).
In order to make the most out of this class, participants are expected to come with some basic knowledge of probability and statistics, and basic familiarity with NLP research.
If you have any questions about the seminar, email me at scohen [strudel] cs.columbia.edu.
To get a feeling about some of the core material in this area, check out Sharon Goldwater's Bayesian language modeling reading list. A lot of progress has been made in this area since this list was last updated, but it presents some of the basic papers and reading material that participants in the class could choose to discuss (of course, students could choose newer papers to present). In the first week of the class, we will compile a more thorough list of papers to choose from. A current seed version of the list exists here, as a PDF file.
Date | Lecturer | Topics | Notes | Reading material |
---|---|---|---|---|
1/28 | Shay | Basic refresher on Probability and Statistics (statistical independence, conditional independence, Bayes' theorem), the Bayesian approach, hypothesis testing, priors in general, Bayesian updating | Slides (most material was presented on the blackboard) | none |
2/4 | Shay | Priors, PCFGs, multinomials, conjugacy, Dirichlet distributions, Bayesian point estimate; QA session about the material read | Slides (most material was presented on the blackboard) | Chapter 2; Optional: Chapter 1 |
2/11 | Bob Carpenter (guest lecture) | Bayesian domain adaptation | Stan Modeling Language Reference Manual Bob's blog post about Bayesian inference |
Jenny Rose Finkel and Christopher D. Manning (2009). Hierarchical Bayesian Domain Adaptation. Proceedings of NAACL. [pdf] Optional reading (for data motivation): John Blitzer, Mark Dredze and Fernando Pereira (2007). Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Proceedings of ACL. [pdf] |
2/18 | Rachel, Armineh | Bayesian estimation, basic comparison of inference methods | Slides Armineh's notes for Gao and Johnson (2008) |
Chapter 3 Jianfeng Gao and Mark Johnson (2008). A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. Proceedings of EMNLP. [pdf] |
2/25 | Shay, Joe | Variational inference, unsupervised POS tagging | Slides (most material was presented on the blackboard) |
Sharon Goldwater and Thomas L. Griffiths (2007). A fully Bayesian approach to unsupervised part-of-speech tagging. Proceedings of ACL.
[pdf]
Sujith Ravi and Kevin Knight (2011). Deciphering Foreign Language. Proceedings of ACL. [pdf] |
3/4 | Yu, Kyle, Kevin | Decipherment, unsupervised POS tagging (cont'd), inference with PCFGs | Yu's slides about decipherment Kevin's slides about Bayesian PCFG inference |
Kristina Toutanova and Mark Johnson (2007). A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Proceedings of NIPS.
[pdf]
Mark Johnson and Thomas Griffiths and Sharon Goldwater (2007). Bayesian inference for PCFGs via Markov chain Monte Carlo. Proceedings of NAACL. [pdf] |
3/11 | Shay, Daniel, Karl | Variational inference (cont'd), Bayesian logic programs, semantic parsing | the material was presented on the blackboard |
Sindhu Raghavan, Raymond J. Mooney and Hyeonseo Ku (2012). Learning to "read between the lines" using Bayesian logic programs. Proceedings of ACL.
[pdf] (note that this paper is not strictly a "Bayesian" paper in the traditional sense, but it is an interesting paper to know about, nevertheless, and there was a demand for it. Food for thought: how would we turn Bayesian logic programs into Bayesian in the full sense of the word?)
Bevan Jones, Mark Johnson and Sharon Goldwater (2012). Semantic parsing with Bayesian tree transducers. Proceedings of ACL. [pdf] |
3/25 | Shay, Anahita, Anup | Variational inference (cont'd), GMMs, grammar induction | all material was presented on the blackboard |
Stephen J. Roberts, Dirk Husmeier, William Penny and lead Rezek (1998). Bayesian Approaches to Gaussian Mixture Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence
[pdf]
Kurihara and Sato (2006). Variational Bayesian grammar induction for natural language. International Colloquium on Grammatical Inference. [pdf] |
4/1 | Shay, Chris, Swabha | Basics of sampling, semantic role induction, finite-state transducers | Chris's slides about semantic role induction |
Titov and Klementiev (2012). A Bayesian Approach to Unsupervised Semantic Role Induction. Proceedings of EACL.
[pdf]
Chiang et al. (2010). Bayesian Inference for Finite-State Transducers. Proceedings of NAACL. [pdf] |
4/8 | Michael, Jessica, Krutika | Language modeling, sentiment mining | Michael's slides on Goldwater and Johnson Jessica's slides on Teh's paper |
Goldwater and Johnson (2004). Priors in Bayesian Learning of Phonological Rules. Proceedings of ACL SIG in Computational Phonology.
[pdf] Teh (2006). A hierarchical Bayesian language model based on Pitman–Yor processes. In Proceedings of ACL. [pdf] Davies and Ghahramani (2011). Language-independent bayesian sentiment mining of twitter. In The Fifth Workshop on Social Network Mining and Analysis. [pdf] |
4/15 | Shay | MCMC sampling | all material was presented on the blackboard | no reading material for this week |
4/22 | Arvind, Shay | machine translation, topic modeling, MCMC sampling | material was presented on the blackboard |
Paul et al. (2011). Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT. Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
[pdf] Blunsom et al. (2009). A Gibbs Sampler for Phrasal Synchronous Grammar Induction. Proceedings of ACL. [pdf] Wallach, Mimno and McCallum (2009). Rethinking LDA: Why Priors Matter. Proceedings of NIPS. [pdf] |
4/29 | Kaili, Yi-Chen, Mohammad | translation, summarization, adaptor grammars | Yi-Chen's slides on Daume and Marcu |
John Denero , Alexandre Bouchard-côté , Dan Klein (2008). Sampling alignment structure under a Bayesian translation model. Proceedings of EMNLP
[pdf] Daumé III, Hal, and Daniel Marcu (2006). Bayesian query-focused summarization. Proceedings of ACL. [pdf] Mark Johnson, Thomas L. Griffiths and Sharon Goldwater (2007). Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models. Proceedings of NIPS. [pdf] |
5/6 | Wael, Shay | Bayesian nonparametrics, summary, history of Bayes rule | Wael's slides The theory that would not die, by Sharon Bertsch McGrayne | no reading material |