Reading list on Bayesian modeling for language

People often ask me what they can read to learn more about recent Bayesian modeling techniques and their applications to language learning. Here is a list of the papers I have found to be most useful and relevant to my own research. I try to emphasize the papers aimed at a slightly less technical/more cognitively inclined audience. This is not intended to be a complete list, only a starting point.

Note: This list has not been updated since 2008, in part because the area has now expanded considerably, and keeping it up-to-date would be difficult. But I've decided to keep this list up in case it's still useful to people.

General introductory material

Thomas L. Griffiths and Alan Yuille (2006). A primer on probabilistic inference. Trends in Cognitive Sciences. Supplement to special issue on Probabilistic Models of Cognition (volume 10, issue 7).

Sharon Goldwater (2006). Nonparametric Bayesian Models of Lexical Acquisition. Unpublished doctoral dissertation, Brown University, 2006.

Daniel J. Navarro, Thomas L. Griffiths, Mark Steyvers, and Michael D. Lee (2006). Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 50, 101-122.

Bayesian language models for learning

Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson (2007). Distributional Cues to Word Segmentation: Context is Important. Proceedings of the 31st Boston University Conference on Language Development.

Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson (2006). Contextual Dependencies in Unsupervised Word Segmentation. Proceedings of Coling/ACL.

Sharon Goldwater and Thomas L. Griffiths. A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging. Proceedings of the Association for Computational Linguistics.

Mark Johnson (2007). Why Doesn't EM Find Good HMM POS-Taggers? Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).

Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein (2007). The infinite PCFG using hierarchical Dirichlet processes. Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL).

Jenny Rose Finkel, Trond Grenager and Christopher D. Manning (2007). The Infinite Tree. Proceedings of the Association for Computational Linguistics.

Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater (2007). Adaptor Grammars: a Framework for Specifying Compositional Nonparametric Bayesian Models. Advances in Neural Information Processing Systems 19.

Thomas L. Griffiths, Michael Steyvers, and Joshua B. Tenenbaum (2007). Topics in semantic representation. Psychological Review, 114, 211-244.

Thomas L. Griffiths, Michael Steyvers, David M. Blei, and Joshua B. Tenenbaum (2005). Integrating topics and syntax. Advances in Neural Information Processing Systems 17.

David Blei, Andrew Ng, and Michael Jordan (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022. (A shorter version appeared in NIPS 2002).

Fei Xu and Joshua B. Tenenbaum (2007). Word learning as Bayesian inference. Psychological Review, 114, 245-272.

Bayesian models of language processing

This isn't really my area, but here are a couple of interesting papers I know of:

Dennis Norris (2006). The Bayesian reader: explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113(2), 327-357.

Naomi Feldman and Thomas L. Griffiths (2007). A rational account of the perceptual magnet effect. Proceedings of the Twenty-Ninth Annual Conference of the Cognitive Science Society.


A bunch of the papers mentioned above have descriptions of sampling algorithms and/or variational inference procedures for specific models. For more general information on these topics, consider reading some of the following:

Sharon Goldwater (2006). Nonparametric Bayesian Models of Lexical Acquisition. Unpublished doctoral dissertation, Brown University, 2006.

Julian Besag (2000). Markov chain Monte Carlo for statistical inference. Working paper no. 9. University of Washington Center for Statistics and the Social Sciences.

Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater (2007). Bayesian Inference for PCFGs via Markov Cain Monte Carlo. Proceedings of the North American Association for Computational Linguistics.

Matthew Beal (2003). Variational Algorithms for Approximate Bayesian Inference. PhD. Thesis, Gatsby Computational Neuroscience Unit, University College London. (Or download individual chapters from here.)

Further Reading

Yee Whye Teh, Michael Jordan, Matthew Beal, and David Blei (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006. 101(476):1566-1581.

Radford Neal (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical report CRG-TR-93-1. University of Toronto Department of Computer Science.