Re: [Corpora-List] question as to MI and t score

From: JFS (jfs@di.fct.unl.pt)
Date: Thu Dec 15 2005 - 15:24:28 MET

  • Next message: Ambiguity in Anaphora: "[Corpora-List] CfP: Ambiguity in Anaphora"

    Ramesh Krishnamurthy wrote:

    >
    > Please see http://torvald.aksis.uib.no/corpora/1999-4/0146.html
    >
    > If I have understood correctly, the MI score tells you about the
    > 'strength of association'
    > (but if the corpus frequency figures for either item are very low,
    > then you may not have much
    > confidence in the association; eg extreme case: X and Y occur only
    > once each in the corpus,
    > but in that one occurrence, they are adjacent to each other); t-score
    > takes into account the
    > corpus frequency of the items, so gives you a'confidence rating' in
    > the association...
    >
    > I suspect that the corpus frequencies for ['play' and 'role] and
    > ['fight' and 'battle'] would also have to be
    > similar for you to make the claim that they have a similar overall
    > collocational relationship...
    >
    > Hope this helps
    > Ramesh
    >
    >
    > At 16:14 14/12/2005, Helene Stengers wrote:
    >
    >> Dear list,
    >>
    >> Imagine you have called up collocation listings for the node word
    >> lemmas "play" and "fight". In both lists, the association with for
    >> example the collocates "role" and "battle" has the exactly the same
    >> MI / t score. Can I assume that both collocations, i.e. "play a role"
    >> and "fight a battle" have the same "collocational strength", or is
    >> that a wrong assumption?
    >>
    >> Thanks,
    >> Helene
    >
    > Ramesh Krishnamurthy
    > Lecturer in English Studies
    > School of Languages and Social Sciences
    > Aston University, Birmingham B4 7ET, UK
    > Tel: +44 (0)121-204-3812
    > Fax: +44 (0)121-204-3766
    > http://www.aston.ac.uk/lss/english/
    >
    Dear

    MI measure is not independent of the bigram frequency. This may be seen
    when X and Y occurs in a prefect co-occurence bigram (X occurs only on
    left of Y, and Y occurs only on right of X); in these cases MI gives a
    higher scores for bigrams of low frequency.

    Try scp(X,Y)= f(X,Y)² / (f(X) * f(Y)). It gives the cohesion between X
    and Y and it is independent of the bigram frequency.

    Or try cosine(X,Y) = f(X,Y)/ sqrt(f(X) * f(Y)). It is also independent
    of the bigram frequency.

    Both measures gives values from 0 to 1.

    Joaquim

    -- 
    Joaquim Ferreira da Silva      	| Tel: +351 21 294 8536
    Professor Auxiliar		|      +351 21 291 8330 ext: 10732
    Departamento de Informática	| Fax: +351 21 294 8541
    Fac. de Ciências e Tecnologia	|jfs@di.fct.unl.pt
    Universidade Nova de Lisboa	|http://terra.di.fct.unl.pt/~jfs/
    2829-516 Caparica, PORTUGAL
     
    



    This archive was generated by hypermail 2b29 : Thu Dec 15 2005 - 16:03:09 MET