Feature Sets

For the experiment described in the article, we compared different feature sets. While most of them are well described in their original publications, Kesser et al. (1997) do not provide the full list of features they used. After contacting the authors, we used the features listed below for our experiments. Note that the features used in the original publication could not be reconstructed completely. Therefore, our features almost certainly differ from the ones used by Kessler et al.


Count of Sentences starting with "And"
Count of Sentences starting with "But"
Count of Sentences starting with "So"
Count of Contractions
Count of "today", "yesterday", "tomorrow"
Count of ("last" / "this" / "next") "week"
Count of "*, where"
Count of "of course"
Count of "it"
Count of "Wh*?"
Count of "?"
Count of "not"
":" per word
":" per sentence
";" per sentence
"(" and ")" per sentence
"," per word
"," per sentence
Quotation mark per sentence
Average sentence length
Standard deviation of sentence length
Average word length
Standard deviation of word length
Type-token ratio
Count of numerals
Count of dates
Count of "shall"
Count of "will"
Count of "a bit"
Count of "hardly"
Count of numbers in parentheses
Count of "-"
Count of Terms of Address, e.g. "Mr."
Count of ", but,"

All KNS Features Plus:
Count of present participles
Count of adverbs
Count of nouns
Count of proper nouns
Count of adjectives
Count of existential there
Count of sentences starting with VBG or VBN
Count of attributive adjectives
Count of past participles
Count of personal pronouns
Count of fragments
Count of sentences ending with IN
Count of prepositions + "wh*"