Re: [Corpora-List] Co-occurrence stats from BNC

From: Afsaneh Fazly (afsaneh@cs.toronto.edu)
Date: Fri Mar 17 2006 - 15:29:42 MET

  • Next message: Santos Diana: "RE: [Corpora-List] Re: [Corpora-list] Incidence of MWEs"

    You should be able to do this easily and quickly, using the
    Ngram Statistics Package (by Ted Pedersen), which can be
    found here:

    http://ngram.sourceforge.net/

    Regards,
    Afsaneh

    On Fri, 17 Mar 2006, MCUSSHS wrote:

    > Sorry if this is a dumb question: for a student project, we would like
    > to get the following stats based on the BNC:
    > (1) frequency (or probability) of all trigrams
    > (2) co-occurrence stats for all word pairs (NOT bigrams, note) based on
    > co-occurrence within the same sentence
    >
    > I assume that this is easy to compute, though time-consuming; and of
    > course I understand that the data will be relatively sparse.
    >
    > So my question is, is this data available somewhere, e.g. someone has
    > already done it; OR: what is the easiest ay to do it?
    >
    > Harold Somers
    >
    >
    >



    This archive was generated by hypermail 2b29 : Fri Mar 17 2006 - 15:51:16 MET