Note (May 30, 2008):

The bigram (HDP model) word segmentation results in the following publications were obtained using an implementation that was later discovered to contain a small bug:

Distributional Cues to Word Segmentation: Context is Important. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of BUCLD, 2007.

Nonparametric Bayesian Models of Lexical Acquisition. Sharon Goldwater. Ph.D. thesis, Brown University, 2006.

Contextual Dependencies in Unsupervised Word Segmentation. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of Coling/ACL, 2006.

Updated segmentation results (which are qualitatively similar) can be found in A Bayesian framework for word segmentation: Exploring the effects of context (Goldwater et al., 2009, Cognition). Please cite results from that paper in future publications.