Re: Corpora: Summary of POS tagger evaluation

Thorsten Brants (thorsten@CoLi.Uni-SB.DE)
Tue, 9 Feb 1999 12:34:56 +0100 (MET)

> Seeing my summary from over a year ago re-posted, I thought I had better
> update it with some of our more recent findings. We tested more taggers,
> and found that the best performers were the CLAWS tagger from Lancaster
> University and the ENGCG tagger from Lingsoft, although none of the tested
> taggers scored in the supposed standard 95% + range (at least not to our
> scoring criteria).

It would be very interesting to see your scoring criteria. Could you please
send a description or a pointer to a description?

What I use is:
1) indication of the sources for training and test sets, so that
results are repeatable by others
2) indication of tagsets
3) strict separation of training and test sets, test data are
neither seen during training nor used for manual encoding of rules
4) repetition of the tests with different training and test sets

Ensuring 3 + 4 by dividing a corpus in 90% training set and 10% test set
and performing 10 test runs whith different partitions (this procedure
may be put into question) I achieve

- for the NEGRA corpus (Frankfurter Rundschau 300,000 tokens,
Stuttgart-Tuebingen-Tagset, 54 tags):
96.3% (standard deviation = 0.27)
- for the Penn Treebank (Wall Street Journal, 1,200,000 tokens,
Penn Tagset, 46 tags):
96.7% (standard deviation = 0.13)

using TnT (see, both results
are well above the 95%+ range.