Re: [Corpora-List] Chi-Square

From: ted pedersen (tpederse@d.umn.edu)
Date: Sun Sep 17 2006 - 17:44:36 MET DST

Next message: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"

Previous message: Jin-Dong Kim: "Re: [Corpora-List] Chi-Square"
In reply to: Jin-Dong Kim: "Re: [Corpora-List] Chi-Square"
Next in thread: Adam Kilgarriff: "RE: [Corpora-List] Chi-Square"
Next in thread: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 18 Sep 2006, Jin-Dong Kim wrote:

> One of the reasons of not using chi-square for text processing would
> be its requirment that each event has to be observed at least five
> times to get realiable statistics, which is not always the case in
> text processing.
> Dunning's log-likelihood is a kind of appoximation of chi-square which
> is known to perform reasonably well for not fequently observed events.
> It is also known to approach to chi-square when each event is observed
> frequently enough.
>
> Regards,
>
> Jin-Dong
>

Greetings collocationalists,

Just to elaborate a little, log-likelihood also has the "requirement"
that each event be observed 5 times, although there are other requirements
that both must adhere to as well (like the distribution of counts should
not be too skewed, etc.). Of course we typically violate these with
reckless abandon in NLP. :)

Chi-squared and log-likelihood are quite closely related (members of the
same family of test) so when one works reasonably well the other probably
does too, and when one is unreliable the other might be too. Some of this
is summarized in an earlier note to this list, and in fact some of
preceding and following messages are also quite relevant:

http://torvald.aksis.uib.no/corpora/1997-1/0160.html

BTW, there is a url mentioned in that note that does not exist any longer,
it has been replaced by http://www.d.umn.edu/~tpederse/pubs.html should
that seem relevant.

I strongly encourage anyone interested in these issues to look carefully
at Read and Cressie (1988), which is cited more fully in the note above.
Among other things, this lays out the history of the log-likelihood
ratio and the Chi-squared test, and actually tells a rather dramatic
story of how they have been in competition since the 1920's or so!

I think Read and Cressie are in some ways trying to mend the rift between
the two measures, and show that rather than these measures being enemies
they are in fact members of the same family, and you can tell alot about
one by looking at the other. Anyway, it's a nice book, highly recommened
both for the technical content and the historical perspective it provides.

Cordially,
Ted

--
Ted Pedersen
http://www.d.umn.edu/~tpederse

Next message: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
Previous message: Jin-Dong Kim: "Re: [Corpora-List] Chi-Square"
In reply to: Jin-Dong Kim: "Re: [Corpora-List] Chi-Square"
Next in thread: Adam Kilgarriff: "RE: [Corpora-List] Chi-Square"
Next in thread: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Sep 17 2006 - 17:47:53 MET DST