RE: [Corpora-List] Lexical bundles - and meaningful items...

From: Ute Römer (ute.roemer@anglistik.uni-hannover.de)
Date: Fri Jul 08 2005 - 07:44:10 MET DST

  • Next message: Babis Theodoulidis: "[Corpora-List] RANLP WORKSHOP ON TEXT MINING - EXTENSION to PAPER DEADLINE"

    Dear John and others,

    That's an interesting issue. In my research on items of evaluative meaning
    in academic discourse I also look at n-grams of different lengths and I get
    the impression that 3 words are just not enough to constitute a meaningful
    item (it is not so much a handling problem, I think; that could be worked
    out). With respect to meaning creation (and that's what I am mainly
    interested in), 4-grams and 5-grams seem to be ideal (and 6+-grams too
    long). They enable you to spot frames/patterns/phrases which express a
    particular meaning. Single words are rather useless, as are most 2-word and
    3-word items I extracted. But I suppose that concordances of frequent
    3-grams may still lead you to some interesting (and meaningful) 4- and
    5-word items.

    Best wishes... Ute

    ********************************************

    Ute Römer
    English Department
    University of Hanover
    Königsworther Platz 1
    30167 Hannover
    Germany
     
    Phone: +49 (0)511 762 2997
    Fax: +49 (0)511 762 2996
    E-mail: ute.roemer@anglistik.uni-hannover.de
    http://www.uteroemer.de
    http://www.fbls.uni-hannover.de/angli/
     

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    > Behalf Of Jenny Eagleton
    > Sent: Monday, July 04, 2005 4:46 AM
    > To: corpora@uib.no
    > Subject: [Corpora-List] Lexical bundles
    >
    > ON BEHALF OF PROF. JOHN FLOWERDEW
    >
    > DEPARTMENT OF ENGLISH AND COMMUNICATION
    >
    > CITY UNIVERSITY OF HONG KONG
    > RE: LEXICAL BUNDLES.
    >
    > I notice that all of the studies I have read on
    > this topic have
    > focussed on 4 word bundles and that you they have
    > all used what I
    > would call large corpora i.e. many millions of
    > words. The rationale
    > seems to be that with 5 word bundles you do not
    > get enough to analyse
    > and that with three word bundles there are
    > probably too many to
    > handle.
    >
    > I want to do a study of bundles on a specific
    > corpus I have, but
    > which only has 600,000 words. To be able to work
    > with large numbers
    > of bundles, it would therefore make sense to focus
    > on 3 word bundles.
    > I could do a study on 4 word bundles, but the
    > sample would be smaller.
    >
    >
    > So my question is, do people see any disadvantages
    > on focusing on
    > 3-word bundles and, if so, what they might be?
    >
    > Looking forward to hearing your responses.
    >
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jul 08 2005 - 08:13:11 MET DST