[Home] [Corpus Analysis] [Feature Sets] [Results]

Results

For our PoS experiment, we assessed the impact of adding single PoS frequencies to a baseline feature set of 13 surface features. Due to space restrictions, only the features that caused a deviation of more than 1% in accuracy for either of the two test sets were presented in the article. The full results are listed in the table below. The numbers are the difference (positive or negative) of the achieved percentage of correctly classified documents from the baseline result after adding the respective PoS tag frequency as a feature.

PoS tagTest Set 1Test Set 2
CC+1.3%+1.2%
CD+1.2%-0.6%
DT+0.1%0.0%
EX+0.1%-0.3%
FW+0.1%0.0%
IN+0.1%+0.2%
JJ+4.5%+1.7%
JJR+0.2%-0.6%
JJS0.0%-0.1%
LS+0.1%+0.1%
MD+1.9%+0.5%
NN+2.8%-4.7%
PoS tagTest Set 1Test Set 2
NNS+1.6%-5.2%
NNP+2.4%+3.0%
NNPS+1.7%-7.0%
PDT+0.1%+0.1%
POS0.0%-0.2%
PRP+0.1%+0.3%
PRP$+0.5%+0.1%
RB+3.8%+3.5%
RBR+0.5%+0.5%
RBS+0.3%+0.3%
RP+0.3%+0.1%
SYM+0.1%0.0%
PoS tagTest Set 1Test Set 2
TO+1.5%+0.6%
UH0.0%0.0%
VB+2.7%+1.8%
VBD+4.8%+7.3%
VBG+0.2%+0.1%
VBN+0.1%+0.2%
VBP+0.8%-0.9%
VBZ+0.7%+2.2%
WDT+0.1%-0.6%
WP+0.7%-0.8%
WP$+0.3%0.0%
WRB+0.3%+0.1%