Re: [Corpora-List] ANC Bigrams and Trigrams

From: Alex Murzaku (lissus@gmail.com)
Date: Mon Feb 14 2005 - 15:07:20 MET

  • Next message: Cristina Vertan: "[Corpora-List] WORKSHOP at RANLP 2005 - first announcement"

    I am working with pronouns on Albanian and I am using trigram data. I
    would suppose that for all those dealing with anaphora phenomena,
    n-grams beyond sentence/paragraph boundaries would be useful. "This"
    [antecedent="useful"] would be true for any language... but I see its
    usage limited so, perhaps having this data separate would make more
    sense.

    On Fri, 11 Feb 2005 14:42:18 -0500, Nancy Ide <ide@cs.vassar.edu> wrote:
    > We are generating bigram and trigram data from the ANC First Release,
    > which will very soon be available on the (new and improved) ANC
    > website. We have a question for those who might be interested in this
    > kind of data: is it useful to generate the data for word pairs/triples
    > that span sentence (or even paragraph) boundaries? Is there any
    > advantage if we provide two sets of the bigram and trigram data, one
    > that spans such boundaries and one that doesn't?
    >
    > Thanks,
    > Nancy Ide
    >
    > =======================================================
    >
    > Nancy Ide
    >
    > Professor of Computer Science
    > Vassar College
    > Poughkeepsie, NY 12604-0520 USA
    > Tel: +1 845 437-5988 Fax: +1 845 437-7498
    > ide@cs.vassar.edu
    >
    > Chercheur Associe
    > Equipe Langue et Dialogue, LORIA/CNRS
    > Campus Scientifique - BP 239
    > 54506 Vandoeuvre-les-Nancy FRANCE
    > Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
    > ide@loria.fr
    >
    > =======================================================
    >
    >



    This archive was generated by hypermail 2b29 : Mon Feb 14 2005 - 15:03:28 MET