RE: [Corpora-List] Incidence of MWEs

From: Adam Kilgarriff (adam@lexmasterclass.com)
Date: Tue Mar 14 2006 - 13:28:43 MET

  • Next message: Ed Kenschaft: "Re: [Corpora-List] Incidence of MWEs"

    > I was wondering if anyone has estimated the incidence of multi-word
    > expressions in language.

    Wonderful, enormous, bottomless question!

    I heard an account of the 'phraseology' symposium in Leeds Uni in 1994 where
    the level of interest and enthusiasm in the topic was such that, at the
    beginning of the event, people were arguing heatedly about 30% of the
    language being phraseological... by the end it had risen to 70!

    The answer must be a function of * what you count, * what you count as the
    language, and * what you count as an MWE, in particular:

    * are you counting types or tokens? (Exercise: what is the proportion
    of multiwords in the mini-corpus comprising the single sentence, "Apple pie
    is apple pie." )
    * what sublanguages do you include - all, some, none? ("mid off" is a
    MWE for anyone who knows cricket but not for anyone who doesn't)
    * how much variation (morphological, syntactic, lexical, modifiers)
    can there be, with it still being the same MWE (or, an MWE at all)
    (Rosamund Moon's example, are "shake in one's shoes", "quake in one's boots"
    and "quake in one's Doc Marten's" all the same MWE?)
    * is non-compositionality a part of the definition?
    * are frequencies or statistics part of the definition? (Theorists
    might not want them to be, but without statistics and thresholds, you won't
    be able to compute a useful answer, and if you do use them, the answer you
    get will depend critically on which statistics and which thresholds you use
    so you had better make principled decisions about them)

    There is one view of language in which the 'standard case' is meaning of
    sentences built from meaning of words, with MWEs being an important kind of
    special case. There is another (specially associated with Birmingham) which
    looks at things the other way round: language usually comes in larger
    chunks, and "free variation" of words is the special case. I quite like the
    latter.

    Adam Kilgarriff
    http://www.kilgarriff.co.uk

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of David Brooks
    Sent: 14 March 2006 11:43
    To: Corpora List
    Subject: [Corpora-List] Incidence of MWEs

    Dear Corpora-folk,

    I was wondering if anyone has estimated the incidence of multi-word
    expressions in language. I know that empirical estimates are tied to
    particular corpora, but does anyone have an account of MWEs for
    particular corpora, so that "ball-park" figures of the proportion of
    MWEs can be estimated?

    Better yet, can anyone give me a good reference for the incidence of MWEs?

    Regards,
    David

    -- 
    David Brooks
    http://www.cs.bham.ac.uk/~djb
    



    This archive was generated by hypermail 2b29 : Tue Mar 14 2006 - 13:28:54 MET