Re: [Corpora-List] Dice coefficient

From: Markus Saers (masaers@gmail.com)
Date: Fri Apr 21 2006 - 16:43:58 MET DST

  • Next message: Steven Bird: "[Corpora-List] Natural Language Toolkit: NLTK-Lite version 0.6.4 released"

    Hello Scott,

    OK, I see. So p(w) is read as "the probability of w occurring in a sentence"
    rather than "the probability of w occurring in the corpus". Thank you very
    much!

    Best regards
    Markus Saers

    On 19/04/06, Piao, Songlin < s.piao@lancaster.ac.uk> wrote:
    >
    > Hi Markus,
    >
    > You must be working on word alignment, but I am not sure if you are using
    > sentence aligned corpora.
    >
    > >that frequency count is used instead, which is problematic
    > >in word alignment since that would presuppose that Ns=Nt
    >
    > If you are using sentence-aligned corpora, you can get the frequencies for
    > ws and wt by counting the aligned sentence pairs in which each of them
    > occurs. In this case, Ns=Nt=total_number_of_aligned_sentence_pairs. As to
    > the co-occurrence frequency for (ws, wt), you can get it by counting the
    > aligned sentence pairs in which both of them occur.
    >
    > If you are not using aligned corpora, you can substitute the aligned
    > sentence pairs with certain corresponsing text segments, such as paragraphs
    > or sections.
    >
    > Hope this helps.
    >
    > Scott Piao
    > --------------------
    > Computing Department
    > Lancaster University
    > UK
    >



    This archive was generated by hypermail 2b29 : Fri Apr 21 2006 - 16:43:40 MET DST