Sharon Goldwater

Lecturer in Informatics, University of Edinburgh.

Contact:
3.27 Informatics Forum
10 Crichton Street
Edinburgh, EH8 9AB
United Kingdom
+44 131 651 5609
email me (sgwater) here: inf.ed.ac.uk


Research Interests | Resources | Publications | Personal Info

Research Interests

One of the great motivating factors in the development of modern linguistic theory is the astonishing ability of children to attain linguistic proficiency in only a few years, with apparently impoverished input. My interests lie in exploring the extent to which this ability can be explained by appealing to probabilistic notions of language and learning. I consider questions such as: What kinds of structures are considered by the learning mechanism? How much and what sort of evidence is necessary to produce generalizations? Are there innate constraints that are specific to language acquisition, or can language be learned successfully using only general learning biases? I investigate these questions by implementing explicit computational models of language acquisition within a Bayesian statistical framework. To date, my research has focused on developing models of morphological and phonological acquisition. I am also interested in the problem of unsupervised learning in general (i.e. learning without access to the "correct" answers), and in adapting and applying unsupervised machine learning techniques in a cognitively plausible way.

Resources

Publications

In Submission (Available on request)

Producing power-law distributions and damping word frequencies with two-stage language models. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. In submission. ( Abstract )

In Press

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Sharon Goldwater, Dan Jurafsky, and Christopher D. Manning. In press, Speech Communication. ( Preprint PDF )

2009

A Bayesian framework for word segmentation: Exploring the effects of context. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Cognition, 112:1, pp. 21-54. doi:10.1016/j.cognition.2009.03.008 ( Preprint PDF )
[NOTE: results in this paper are based on a newer version of the code used in the ACL06 and BUCLD07 word segmentation papers and chapter 5 of my thesis. The new version corrects a small bug in the implementation of the bigram (HDP) model. Please cite results from this paper in future publications.]

A note on the implementation of Hierarchical Dirichlet Processes. Phil Blunsom, Trevor Cohn, Sharon Goldwater, and Mark Johnson. Proceedings of ACL, 2009. ( PDF )

Evaluating models of syntactic category acquisition without using a gold standard. Stella Frank, Sharon Goldwater, and Frank Keller. Proceedings of CogSci, 2009. ( PDF )

Improving morphology induction by learning spelling rules. Jason Naradowsky and Sharon Goldwater. Proceedings of IJCAI, 2009. ( PDF )

Improving nonparametric Bayesian inference: Experiments on unsupervised word segmentation with adaptor grammars. Mark Johnson and Sharon Goldwater. Proceedings of NAACL, 2009. ( PDF )

Inducing compact but accurate tree-substitution grammars. Trevor Cohn, Sharon Goldwater, and Phil Blunsom. Proceedings of NAACL, 2009. ( PDF )

2008

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase ASR error rates. Sharon Goldwater, Dan Jurafsky, and Christopher D. Manning. Proceedings of ACL, 2008. ( PDF )

2007

Modeling human performance on statistical word segmentation tasks. Michael C. Frank, Sharon Goldwater, Vikash Mansinghka, Tom Griffiths, and Joshua Tenenbaum. Proceedings of the 29th Annual Meeting of the Cognitive Science Society, 2007. ( PDF )

A fully Bayesian approach to unsupervised part-of-speech tagging. Sharon Goldwater and Thomas L. Griffiths. Proceedings of ACL, 2007. ( PDF )

Bayesian inference for PCFGs via Markov Cain Monte Carlo. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Proceedings of NAACL, 2007. ( PDF )

Adaptor Grammars: a framework for specifying compositional nonparametric Bayesian models. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Advances in Neural Information Processing Systems 19, 2007. ( PDF )

Distributional cues to word segmentation: Context is important. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of the 31st Boston University Conference on Language Development, 2007. ( PostScript , PDF ) If you plan to cite results from this paper, see this note.

2006

Nonparametric Bayesian models of lexical acquisition. Sharon Goldwater. Ph.D. thesis, Brown University, 2006. Tree-saving version (single spaced with minimal front matter, 115 pages), Official version (double spaced with all front matter, 176 pages). If you plan to cite results on word segmentation, see this note.

Contextual dependencies in unsupervised word segmentation. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of Coling/ACL, Sydney, 2006. ( PostScript , PDF .) Code is available on request. If you plan to cite results from this paper, see this note.

A non-parametric Bayesian approach to spike sorting. Frank Wood, Sharon Goldwater, and Michael J. Black. Proceedings of the 28th IEEE Conference on Engineering in Medicine and Biologicial Systems, pages 1165-1169, 2006. ( PDF )

Interpolating between types and tokens by estimating power-law generators. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Advances in Neural Information Processing Systems 18, 2006. ( PostScript , PDF ) [NOTE: this is a corrected version.]

2005

Improving statistical MT through morphological analysis. Sharon Goldwater and David McClosky. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Vancouver, 2005. ( PostScript , PDF )

Representational bias in unsupervised learning of syllable structure. Sharon Goldwater and Mark Johnson. Proceedings of the 9th Conference on Computational Natural Language Learning (CONLL), Ann Arbor, 2005. ( PostScript , PDF )

2004 and earlier

Priors in Bayesian learning of phonological rules. Sharon Goldwater and Mark Johnson. Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON), Barcelona, 2004. ( PostScript , PDF )

A type system for statically detecting spreadsheet errors. Yanif Ahmad, Tudor Antoniu, Sharon Goldwater, and Shriram Krishnamurthi. Proceedings of the 18th IEEE International Symposium on Automated Software Engineering, 2003. ( PostScript , PDF )

Learning OT constraint rankings using a Maximum Entropy model. Sharon Goldwater and Mark Johnson. Proceedings of the Workshop on Variation within Optimality Theory, Stockholm University, 2003. ( PostScript , PDF )

Building a robust dialog system with limited data. Sharon Goldwater, Elizabeth Owen Bratt, Jean-Mark Gawron, and John Dowding. Proceedings of the Workshop on Conversational Systems at NAACL, 2000. (PostScript )

Interpreting language in context in CommandTalk. John Dowding, Elizabeth Owen Bratt, and Sharon Goldwater. Communicative Agents Workshop, Seattle, WA, 1999. (PostScript)

Edge-based best-first chart parsing. Eugene Charniak, Sharon Goldwater, and Mark Johnson. Proceedings of the Sixth Workshop on Very Large Corpora at COLING-ACL, 1998. (PostScript)

Personal Information

My admittedly impoverished personal web page is available for those who are inclined to be nosy.
Last modified: Tue Oct 6 15:33:57 BST 2009