[Corpora-List] Dice coefficient

From: Markus Saers (masaers@gmail.com)
Date: Wed Apr 19 2006 - 09:50:07 MET DST

  • Next message: Piao, Songlin: "RE: [Corpora-List] Dice coefficient"

    Hello,

    My name is Markus Saers, and I am currently implementing an anlignment tool
    as part of a course in Java for NLP. When trying to implement the Dice
    coefficient, I ran into some problems that I was hoping someone could help
    me with.

    The only definition of the Dice coefficient that I have seen looks like
    this:

    Dice = 2 * p(ws, wt) / ( p(ws) + p(wt) )

    Where p(ws, wt) is the probability of the source word co-occurring with the
    target word, p(ws) is the probability of the source word and p(wt) is the
    probability of the target word.

    Although it is stated as probabilities, some info that I gathered on the net
    seems to suggest that frequency count is used instead, which is problematic
    in word alignment since that would presuppose that Ns=Nt (where Ns is the
    number of source words and Nt is the number of target words).

    The second problem arise when probabilities ARE used. p(ws) and p(wt) are
    easy to estimate, but how is p(ws, wt) estimated?

    Best regards
    Markus Saers
    PhD student, Uppsala University



    This archive was generated by hypermail 2b29 : Wed Apr 19 2006 - 09:49:26 MET DST