RE: [Corpora-List] Questions about collocations and collocation extraction tools

From: Mark Davies (Mark_Davies@byu.edu)
Date: Wed Aug 02 2006 - 16:52:46 MET DST

  • Next message: Nicholas Anagnostou: "Re: [Corpora-List] Questions about collocations and collocation extraction tools"

    Hi Nicholas,

    > 3. I need to compile a collocation frequency list as general (not
    > genre- or sublanguage- specific) as possible. Do you consider
    > the BNC Baby to be a corpus general enough for this task or
    > do I need to use another corpus?
    >
    > 4. I need to specify frequency thresholds for the
    > collocations (or the collocation candidates to be more
    > precise). Is f >= 3 considered to be an adequate cut-off? I
    > know that I have to filter out the hapax and dis legomena,
    > but from which frequency onwards does a collocation become
    > statistically significant?

    As far as BNC-specific information on collocations, you might look at
    http://view.byu.edu.

    This interface to the BNC allows you to look for collocates within a 20
    word window, and sort by raw frequency or something akin to a z-score
    for the collocates. It also allows you to limit the query to specific
    registers/genres in the BNC, and to specify minimum frequency
    thresholds. Finally, you can compare the collocates for a given word in
    two (sets of) registers, and to compare the collocates of two competing
    words, all with one simple query.

    Any questions, please feel free to ask.

    Best,

    Mark Davies

    =================================================

    Mark Davies
    Professor of (Corpus) Linguistics
    Brigham Young University
    (phone) 801-422-9168 / (fax) 801-422-0906

    http://davies-linguistics.byu.edu

    ** Corpus design and use // Linguistic databases **
    ** Historical linguistics // Language variation **
    ** English, Spanish, and Portuguese **

    =================================================



    This archive was generated by hypermail 2b29 : Wed Aug 02 2006 - 16:50:49 MET DST