Keller, Frank and Maria Lapata. 2003. Using the Web to Obtain Frequencies for Unseen Bigrams. In Computational Linguistics 29:3, 459-484.

This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between web frequencies and corpus frequencies; (b) a reliable correlation between web frequencies and plausibility judgments; (c) a reliable correlation between web frequencies and frequencies recreated using class-based smoothing; (d) a good performance in a pseudo-disambiguation task using web frequencies.


@Article{Keller:Lapata:03,
  author = 	 {Frank Keller and Maria Lapata},
  title = 	 {Using the Web to Obtain Frequencies for Unseen Bigrams},
  journal =      {Computational Linguistics},
  olume =        29,
  issue =        3,
  year = 	 2003,
  pages =        {459--484}
  
}