|
Keller, Frank and Maria Lapata. 2003. Using the Web to Obtain Frequencies for Unseen Bigrams. In Computational Linguistics 29:3, 459-484. This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between web frequencies and corpus frequencies; (b) a reliable correlation between web frequencies and plausibility judgments; (c) a reliable correlation between web frequencies and frequencies recreated using class-based smoothing; (d) a good performance in a pseudo-disambiguation task using web frequencies.
@Article{Keller:Lapata:03,
author = {Frank Keller and Maria Lapata},
title = {Using the Web to Obtain Frequencies for Unseen Bigrams},
journal = {Computational Linguistics},
olume = 29,
issue = 3,
year = 2003,
pages = {459--484}
}
|