Re: Corpora: PoS tagging -- upper case text

John Aberdeen (aberdeen@mitre.org)
Tue, 16 Feb 1999 15:03:50 -0500

Mark Stevenson wrote:
>
> Hi,
>
> I was wondering whether anyone had any experience of using Part of Speech
> taggers on all upper case text. I am especially interested in whether
> there rae any publically available taggers which are designed/have been
> adapted for all upper case text. If possible I'd also like to know the rough
> levels of result which could be expected for this task (presumably it'd be
> lower than PoS tagging on mixed case text).
>
> Best,
> Mark Stevenson

Our part of speech tagger (a fast implementation of the Brill algorithm)
has been trained on both mixed case text and all upper case text. On
mixed case Wall Street Journal data we get tagging accuracy of around
96.5%, and on artificially upcased WSJ data we get around 94.5% accuracy.

Our tagger is available as part of the Alembic Workbench distribution,
and ships with rules and lexica for both mixed case and upcase English.

http://www.mitre.org/technology/alembic-workbench/

Regards,
John

-------------------------------------------------------
John Aberdeen aberdeen@mitre.org
Senior Scientist Natural Language Processing
The MITRE Corporation voice +1.781.271.2840
Bedford, Massachusetts USA fax +1.781.271.2352
-------------------------------------------------------