SpeakerDetmar Meurers
DateSep 04, 2015
Time02:00PM 03:30PM
LocationIF 4.31/4.33
TitleReadability analysis as an exploration of linguistic complexity

The analysis of readability has traditionallyrelied on surface properties of language, such as average sentence and wordlengths and specific word lists.  At thesame time, there is a long tradition analyzing the Complexity, Accuracy, and Fluency(CAF) of language produced by language learners in second language acquisition(SLA) research.   Reusing SLA measures oflearner language complexity to analyze readability, Sowmya Vajjala and Iexplored which aspects of linguistic modeling can successfully be employed to predictthe readability of a native language text. Using various machine learning setups  and corpora,  we  show that  a  broad range  of  linguistic properties are highly indicativeof the readability of documents, from graded readers  to web pages and TV programs targeting different age groups. The readability model using the full linguisticfeature set currently is the best non-commercial readability model availablefor English, as measured on the standard Common Core State Standard data.

The fact that readability isreflected in a wide range of linguistic aspects also is of relevance for researchon text simplification, where the model can in principle be used to identifywhich sentences are worth simplifying in which way and to evaluate onedimension of the success of automatic simplification. As a prerequisite of suchapplications,   we show   that our  text readability   models can successfullybe applied to individual sentences.

The talk will try to trace theideas sketched above based on the joint work with Sowmya Vajjala listed below,which are downloadable from: http://purl.org/dm/papers

