Re: Corpora: Summary of POS tagger evaluation

Oliver Mason (oliver@clg.bham.ac.uk)
Tue, 9 Feb 1999 13:56:26 +0000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: amalia@liia.u-strasbg.fr: "Corpora: ESSLLI'99 Student Session 2nd Call for Papers"
Previous message: pkuehnle: "Corpora: Last call for papers"
Maybe in reply to: Yen Ketty: "Corpora: Summary of POS tagger evaluation"
Next in thread: Philip Resnik: "Re: Corpora: Summary of POS tagger evaluation"

Andrew Harley writes:

(1) Punctuation tags are not considered in the scoring! It is amazing how
many taggers treat a punctuation mark as a token in the scoring.

This raises an issue which is slightly more complex: if you exclude
punctuation (presumably on the grounds that a comma is always tagged
as `comma' and there is no ambiguity), why include other unambiguous
tokens in the scoring? If `the' always gets assigned `DET', and no
other tags for it are possible, then why count it and not the comma?

The scoring problem becomes more obvious when the tagging process is
divided into its two main components: tag assignment and tag disambiguation.
In the first stage, all possible tags are assigned to each token, which
might be a more or less precise list (especially in the case of tokens
which are not in the lexicon and a guesser would have to be used). Then
during the second stage the most likely token (assuming a probabilistic
approach) is selected from the list of candidates.

The `real' work is done during the disambiguation stage. And to measure it
one should disregard all tokens which only have one tag assigned. This of
course is tagset dependent: two extreme cases of 100% correctness are
1) a tagset with a different tag for each type (`the' is tagged `THE' &c)
2) a tagset with one tag for all tokens (each token is tagged `TOKEN')

Obviously these two tagsets don't make any sense, but hey, you've got 100%
performance on any text, with or without punctuation.

A tagger's performance can only be measured sensibly if some indicator
of the complexity of the tagging task is given. [shameless plug follows:]
Dan Tufis & myself have proposed an evaluation metric based on the
average number of tags per token in a paper at the LREC conference last
year. Here each percentage reporting the tagging accuracy would be
augmented by a factor indicating the difficulty of the task. 90% on a
highly ambiguous text might then show a better performance than 96% on
a simple text with few ambiguous tokens.

Oliver

Tufis, Dan; Mason, Oliver (1998)
"Tagging Romanian Texts: a Case Study for QTAG, a Language Independent
Probabilistic Tagger"
Proceedings of the First International Conference on Language Resources
& Evaluation (LREC), Granada (Spain), 28-30 May 1998, p.589-596

-- 
//\\ computer officer | corpus research | department of english | school of  -
//\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt  -
\\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\  -
\\// mobile 07050 104504 | http://www-clg.bham.ac.uk | o.mason@bham.ac.uk\/  -

Next message: amalia@liia.u-strasbg.fr: "Corpora: ESSLLI'99 Student Session 2nd Call for Papers"
Previous message: pkuehnle: "Corpora: Last call for papers"
Maybe in reply to: Yen Ketty: "Corpora: Summary of POS tagger evaluation"
Next in thread: Philip Resnik: "Re: Corpora: Summary of POS tagger evaluation"