Re: Corpora: Statistical significance of tagging differences

J. Zavrel (Jakub.Zavrel@kub.nl)
Wed, 17 Mar 1999 14:57:49 +0100 (MET)

Mark Stevenson wrote:

> I was wondering if anyone knows of the appropriate statistical tests which
> could be applied to determine whether the differences in tagging performace
> are statistically significant?

In similar circumstances as you describe I usually use McNemar's
chi-squared test to check significance. This means that
to compare the taggings of two taggers, you make a cross-tabulation
of taggerA-correct vs. taggerB-correct:

tagger A
Wrong Correct
Wrong n_00 | n_01
tagger B ------------------
Correct n_10 | n_11

And then you just use some statistics package that knows how to do McNemar's
test (S-Plus does) ..... ;-)

If you want to know more, read:

Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing
Supervised Classification Learning Algorithms. Neural
Computation, 10 (7) 1895-1924. Postscript preprint. (Revised December 30, 1997).
Available from: http://www.cs.orst.edu/~tgd/cv/pubs.html

I'm interested to know if people think this is a good test to compare taggers.
I haven't found any other reasonable one.

Regards,

--Jakub

------------------------------------------------------------------------------
Jakub Zavrel, B 330, Tilburg University, POBox 90153, 5000 LE Tilburg, NL
http://ilk.kub.nl/~zavrel/ tel/fax: +31-13-4663163/3110
------------------------------------------------------------------------------