Re: [Corpora-List] Chi-Square

From: Jin-Dong Kim (jdkim@is.s.u-tokyo.ac.jp)
Date: Sun Sep 17 2006 - 17:16:27 MET DST

Next message: ted pedersen: "Re: [Corpora-List] Chi-Square"

Previous message: Marco Baroni: "Re: [Corpora-List] Chi-Square"
In reply to: Marco Baroni: "Re: [Corpora-List] Chi-Square"
Next in thread: ted pedersen: "Re: [Corpora-List] Chi-Square"
Reply: ted pedersen: "Re: [Corpora-List] Chi-Square"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

One of the reasons of not using chi-square for text processing would
be its requirment that each event has to be observed at least five
times to get realiable statistics, which is not always the case in
text processing.
Dunning's log-likelihood is a kind of appoximation of chi-square which
is known to perform reasonably well for not fequently observed events.
It is also known to approach to chi-square when each event is observed
frequently enough.

Regards,

Jin-Dong

On 9/17/06, Marco Baroni <baroni@sslmit.unibo.it> wrote:
> You can see the comparison of chi-square and log-likelihood ratio in this
> famous paper, that I think was very influential in giving the Chi-square
> test a bad name:
>
> T. Dunning, "Accurate Methods for the Statistics of Surprise and
> Coincidence," Computational Linguistics 19(1), 1993.
> http://citeseer.ist.psu.edu/dunning93accurate.html
>
> The paper is quite mathematical, but the basic idea and the empirical
> comparison part should be quite clear... (although the alternative to
> chi-square should be something like the log-likelihood ratio test, not MI,
> that has the same problem of overestimation of the significance of the
> co-occurrence of rare words that the chi-square test has...)
>
>
> Regards,
>
> Marco
>
>

Next message: ted pedersen: "Re: [Corpora-List] Chi-Square"
Previous message: Marco Baroni: "Re: [Corpora-List] Chi-Square"
In reply to: Marco Baroni: "Re: [Corpora-List] Chi-Square"
Next in thread: ted pedersen: "Re: [Corpora-List] Chi-Square"
Reply: ted pedersen: "Re: [Corpora-List] Chi-Square"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Sep 17 2006 - 17:14:18 MET DST