RE: [Corpora-List] Chi-Square

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Sun Sep 17 2006 - 23:33:44 MET DST

Next message: Martin Volk: "[Corpora-List] Symposium on Parallel Treebanks"

Previous message: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
In reply to: Marco Baroni: "Re: [Corpora-List] Chi-Square"
Next in thread: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Crayton,

I've had a go at explaining just this to non-mathematicians in a recent
paper called "Language is never ever ever random", see
http://www.kilgarriff.co.uk/publications.htm

Here's the core reason (taken from the abstract)

Language users never choose words randomly, and language is essentially
non-random. Statistical hypothesis testing [eg chi-square] uses a null
hypothesis, which
posits randomness. Hence, when we look at linguistic phenomena in corpora,
the null hypothesis will never be true. Moreover, where there is enough
data, we shall (almost) always be able to establish that it is not true. In
corpus studies, we frequently do have enough data, so the fact that a
relation between two phenomena is demonstrably non-random, does not support
the inference that it is not arbitrary.

Adam

Crayton Walker wrote:

> A simple question about statistical measures.
>
> Could someone explain in very simple terms why we don't normally use
> Chi-square as a measure of collocational significance? We tend to use
> t-score and MI and not Chi-square. Why not? I am not a mathematician
> so would appreciate it if you could keep it simple.
>
> Many thanks
>
> Crayton Walker
>
> University of Birmingham
>

Next message: Martin Volk: "[Corpora-List] Symposium on Parallel Treebanks"
Previous message: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
In reply to: Marco Baroni: "Re: [Corpora-List] Chi-Square"
Next in thread: FIDELHOLTZ_DOOCHIN_JAMES_LAWRENCE: "[Corpora-List] Re: Chi-Square"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Sep 17 2006 - 23:31:30 MET DST