|Date||Sep 04, 2015|
|Title||Readability analysis as an exploration of linguistic complexity|
The analysis of readability has traditionallyrelied on surface properties of language, such as average sentence and wordlengths and specific word lists. At thesame time, there is a long tradition analyzing the Complexity, Accuracy, and Fluency(CAF) of language produced by language learners in second language acquisition(SLA) research. Reusing SLA measures oflearner language complexity to analyze readability, Sowmya Vajjala and Iexplored which aspects of linguistic modeling can successfully be employed to predictthe readability of a native language text. Using various machine learning setups and corpora, we show that a broad range of linguistic properties are highly indicativeof the readability of documents, from graded readers to web pages and TV programs targeting different age groups. The readability model using the full linguisticfeature set currently is the best non-commercial readability model availablefor English, as measured on the standard Common Core State Standard data.
The fact that readability isreflected in a wide range of linguistic aspects also is of relevance for researchon text simplification, where the model can in principle be used to identifywhich sentences are worth simplifying in which way and to evaluate onedimension of the success of automatic simplification. As a prerequisite of suchapplications, we show that our text readability models can successfullybe applied to individual sentences.
The talk will try to trace theideas sketched above based on the joint work with Sowmya Vajjala listed below,which are downloadable from: http://purl.org/dm/papers
Sowmya Vajjala (2015) "Analyzing Text Complexity and Text Simplification:Connecting Linguistics, Processing and Educational Applications". PhD thesis, Eberhard-Karls UniversitätTübingen. http://hdl.handle.net/10900/64359
Sowmya Vajjala and Detmar Meurers(2015) “Readability Assessment for Text Simplification: From AnalyzingDocuments to Identifying Sentential Simplifications". International Journal of Applied Linguistics,Special Issue on Current Research in Readability and Text Simplification editedby Thomas François & Delphine Bernhard.
Sowmya Vajjala and Detmar Meurers(2014) “Assessing the relative reading levelof sentence pairs for text simplification". Proceedings of EACL. Gothenburg, Sweden.
Sowmya Vajjala and Detmar Meurers(2014) “Exploring Measures of 'Readability' for Spoken Language: Analyzinglinguistic features of subtitles to identify age-specific TV programs. Proceedings of the 3rd Workshop on Predictingand Improving Text Readability for Target Reader Populations (PITR), EACL.Gothenburg, Sweden.
Sowmya Vajjala and Detmar Meurers(2013) "On the Applicability of Readability Models to WebTexts." Proceedings of the Workshopon Predicting and Improving Text Readabilityfor Target Reader Populations (PITR), ACL. Sofia, Bulgaria.
Julia Hancke, Sowmya Vajjala andDetmar Meurers (2012) "Readability Classification for German using lexical, syntactic, and morphologicalfeatures". Proceedings of COLING, Mumbai, India.
Sowmya Vajjala and Detmar Meurers(2012) "On Improving the Accuracy of Readability Classification usingInsights from Second Language Acquisition". Proceedings of BEA7, ACL.Montreal, Canada.