Keller, Frank and Maria Lapata. 2003. Using the Web to Obtain Frequencies for Unseen Bigrams. In Computational Linguistics 29:3, 459-484. This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between web frequencies and corpus frequencies; (b) a reliable correlation between web frequencies and plausibility judgments; (c) a reliable correlation between web frequencies and frequencies recreated using class-based smoothing; (d) a good performance in a pseudo-disambiguation task using web frequencies.
@Article{Keller:Lapata:03, author = {Frank Keller and Maria Lapata}, title = {Using the Web to Obtain Frequencies for Unseen Bigrams}, journal = {Computational Linguistics}, olume = 29, issue = 3, year = 2003, pages = {459--484} }
|