Samuel Brody and Mirella Lapata. 2009. Bayesian Word Sense Induction. In Proceedings of the 12th Conference of the European Chapter of the ACL, 103--111. Athens, Greece.

Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word's contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. The Bayesian framework provides a principled way to incorporate a wide range of features beyond lexical co-occurrences and to systematically assess their utility on the sense induction task. The proposed approach yields improvements over state-of-the-art systems on a benchmark dataset.

@InProceedings{brody-lapata:2009:EACL,
  author    = {Brody, Samuel  and  Lapata, Mirella},
  title     = {Bayesian Word Sense Induction},
  booktitle = {Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)},
  year      = {2009},
  address   = {Athens, Greece},
  pages     = {103--111}
}