|Date||Oct 05, 2012|
|Title||Probabilistic prediction in human language processing|
|Abstract||Human language users are skilled at using rich context to predict upcoming linguistic material in real-time. One way this produces observable effects is that more predictable words are read faster than less predictable words. However, the quantitative shape of the relationship between predictability and reading time has never before been measured. We found that reading time varies in direct proportion with the logarithm of word probability conditioned on context, and this holds consistently over six orders of magnitude. This result contradicts common intuitions about how people make predictions in language processing, but is predicted by both an optimal Bayesian perceptual discrimination model (Norris, 2006) and a model in which human comprehension is highly incremental at the sub-word level. Furthermore, it provides a potential unified explanation for a variety of behavioural effects driven by lexical, syntactic, semantic, and pragmatic structure, under the rubric of "surprisal theory". Next, we'll consider how humans estimate these probabilities in the first place. We elicited participants' subjective probabilities, compared them to objective probabilities from corpora, and found large and systematic differences, some of which seem to mirror the smoothing biases used in similar situations in computational linguistics.|
Nathaniel J. Smith is a postdoc in Informatics at the University of Edinburgh, working with Mark Steedman. Nathaniel obtained his PhD from the University of California San Diego, working with Roger Levy and Marta Kutas. His primary research interest is the interaction between linguistic and non-linguistic cognition, at both the levels of representation and processing.