Re: Corpora: PoS tagging -- upper case text

Thorsten Brants (thorsten@CoLi.Uni-SB.DE)
Tue, 16 Feb 1999 12:02:21 +0100 (MET)

Mark Stevenson wrrote:
> I was wondering whether anyone had any experience of using Part of Speech
> taggers on all upper case text. I am especially interested in whether
> there rae any publically available taggers which are designed/have been
> adapted for all upper case text. If possible I'd also like to know the rough
> levels of result which could be expected for this task (presumably it'd be
> lower than PoS tagging on mixed case text).
>

some time ago I did experiments on all lower case text with my tagger,
i.e. training with all lower case and subsequently testing with all
lower case. The results should be equivalent to all upper case, since
what is missing is the capitalization information. The results were
around 0.6% worse for German, and around 1.2% worse for English,
compared to training and testing on mixed case text.

-Thorsten