Re: [Corpora-List] Lexical bundles - and meaningful items...

From: Chris Butler (csblists@telefonica.net)
Date: Fri Jul 08 2005 - 09:36:16 MET DST

  • Next message: Piet Mertens: "[Corpora-List] Conference RECITAL 2006 Call for papers"

    Dear John and other list members,

    Ute Römer said:

    "But I suppose that concordances of frequent
    3-grams may still lead you to some interesting (and meaningful) 4- and
    5-word items."

    For lists of 3-word strings as well as longer ones, derived from English
    corpora, you might like to look at the following, if you haven't already
    done so:

    Stubbs, Michael and Isabel Barth (2003) 'Using recurrent phrases as text
    type discriminators: a quantitative method and some findings." Functions of
    Language 10(1): 61-104.

    For similar data from Spanish, derived from smaller corpora (some as small
    as 125000 words, none bigger than 1 million words), see

    Butler, Christopher S. (1997) "Repeated word combinations in spoken and
    written text: some implications for Functional Grammar." In C. S: Butler, J.
    H. Connolly, R. A. Gatward and R. M. Vismans (eds.) A Fund of Ideas: Recent
    Developments in Functional Grammar. Amsterdam: Institute for Functional
    Research into Language and Language Use (IFOTT).

    [As this is in a rather obscure publication which may be difficult for
    people to get hold of, I could send an electronic version to anyone who is
    interested.]

    Also, Bengt Altenberg says in the following paper that most of the recurrent
    sequences he isolated from the London-Lund Corpus were pretty short, with an
    average of 3.15 words, and he gives a lot of examples of phraseologically
    interesting 3-word sequences:

    Altenberg, Bengt (1998) On the phraseology of Spoken English: the evidence
    of recurrent word combinations." In A. P. Cowie (ed.) Phraseology: Theory,
    Analysis, and Applications". Oxford: Clarendon Press.

    Chris Butler

    ********************************************

    Ute Römer
    English Department
    University of Hanover
    Königsworther Platz 1
    30167 Hannover
    Germany

    Phone: +49 (0)511 762 2997
    Fax: +49 (0)511 762 2996
    E-mail: ute.roemer@anglistik.uni-hannover.de
    http://www.uteroemer.de
    http://www.fbls.uni-hannover.de/angli/

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    > Behalf Of Jenny Eagleton
    > Sent: Monday, July 04, 2005 4:46 AM
    > To: corpora@uib.no
    > Subject: [Corpora-List] Lexical bundles
    >
    > ON BEHALF OF PROF. JOHN FLOWERDEW
    >
    > DEPARTMENT OF ENGLISH AND COMMUNICATION
    >
    > CITY UNIVERSITY OF HONG KONG
    > RE: LEXICAL BUNDLES.
    >
    > I notice that all of the studies I have read on
    > this topic have
    > focussed on 4 word bundles and that you they have
    > all used what I
    > would call large corpora i.e. many millions of
    > words. The rationale
    > seems to be that with 5 word bundles you do not
    > get enough to analyse
    > and that with three word bundles there are
    > probably too many to
    > handle.
    >
    > I want to do a study of bundles on a specific
    > corpus I have, but
    > which only has 600,000 words. To be able to work
    > with large numbers
    > of bundles, it would therefore make sense to focus
    > on 3 word bundles.
    > I could do a study on 4 word bundles, but the
    > sample would be smaller.
    >
    >
    > So my question is, do people see any disadvantages
    > on focusing on
    > 3-word bundles and, if so, what they might be?
    >
    > Looking forward to hearing your responses.
    >
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jul 08 2005 - 11:07:23 MET DST