Re: [Corpora-List] question as to MI and t score

From: Stefan Evert (stefan.evert@uos.de)
Date: Thu Dec 15 2005 - 17:57:12 MET

  • Next message: Alessandro Oltramari: "[Corpora-List] [protege-discussion] OntoLex 2006 (hosted by LREC2006) - Call for Papers"

    Dear Helene,

    I suppose your rationale is that since MI and t-score measure two
    different aspects of collocations (MI not being sensitive to absolute
    frequency per se, while t-score is very sensitive in this respect), if
    both values are the same for "play - role" and "fight - battle", the
    "collocational strength" should be the same in all respects. Is this
    interpretation correct?

    However, if both scores are the same for the two collocations, this
    means simply that both the observed frequencies and the expected
    frequencies of "play - role" and "fight - battle" are identical (you can
    work this out relatively easily from equations, e.g. those given on
    www.collocations.de/AM). While this doesn't indicate a difference in the
    degree of collocation, of course, it no more "proves" that the
    collocational strength is really identical than observing the same
    frequency for a phenomenon in two different corpora proves anything
    about that phenomenon in general – the observation may just as well be
    due to the vagaries of sampling, especially when the frequencies are
    very low.

    What you can do is to rule out a large difference between the
    collocational strengths of "play a role" and "fight a battle" with a
    certain degree of statistical confidence. Working out exactly what upper
    bounds on this difference one can assume with how much confidence is
    almost as difficult as a mathematical problem as interpreting the
    differences is as a linguistic problem (what does it really mean if the
    difference in collocational strength is at most "1.7"??).

    Best regards,
    Stefan

    >
    >
    > Imagine you have called up collocation listings for the node word
    > lemmas "play" and "fight". In both lists, the association with for
    > example the collocates "role" and "battle" has the exactly the same MI
    > / t score. Can I assume that both collocations, i.e. "play a role" and
    > "fight a battle" have the same "collocational strength", or is that a
    > wrong assumption?
    >
    > Thanks,
    > Helene



    This archive was generated by hypermail 2b29 : Thu Dec 15 2005 - 18:02:24 MET