Re: Corpora: Statistical significance of tagging differences

James L. Fidelholtz (jfidel@siu.buap.mx)
Thu, 18 Mar 1999 10:33:56 -0600 (CST)

On Wed, 17 Mar 1999, Chris Brew wrote:

[snip]

>cf van Halteren, Zavrel and Daelemans, proceedings Coling-98, vol1 pp 491ff,
>footnote 7, using McNemar's chi-square. Since in POS tagging we are
>typically dealing with large corpora, even numerically small
>differences in error rate are likely to be statistically
>significant. Statistical
>significance is of course not the only relevant criterion.

Since several people have referred to this article on a chi-square test,
it seems pertinent to note that chi-square is designed for checking
arrays with SMALL numbers (say, under about 100 per cell, if memory
serves). Furthermore, as Chris indicates too tenderly, with much larger
numbers (say, over 1000 per cell) you are virtually GUARANTEED
statistical significance, which means the test is then just as virtually
useless. The use of statistics always requires careful checking that
the test you are using is appropriate to the data.

Jim

James L. Fidelholtz e-mail: jfidel@siu.buap.mx
Maestri'a en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Beneme'rita Universidad Auto'noma de Puebla, ME'XICO