Re: [Corpora-List] Incidence of MWEs

From: Afsaneh Fazly (afsaneh@cs.toronto.edu)
Date: Fri Mar 17 2006 - 15:40:36 MET

  • Next message: Afsaneh Fazly: "Re: [Corpora-List] Co-occurrence stats from BNC"

    This is a very interesting and important question:
    whether multiword units such as "kick the bucket", "make an offer",
    or "light pen" should be considered as single syntactic units
    with no internal structure.

    The introduction to the Oxford Dictionary of Current Idiomatic
    English (Vol.2, A. P. Cowie, R. Mackin, I. R. McCaig, 1983)
    includes a very interesting discussion on the topic. Although,
    the issue as discussed there is seen from a lexicographical
    point of view, I see it very well relevant to the original
    question.

    There is also a very interesting corpus-based study of
    so-called "fixed expressions" in English by Rosamund Moon:

      "Fixed Expressions and Idioms in English, A Corpus-based
      Approach", Oxford Studies in Lexicography and Lexicology,
      1998.

    The above (and other related) studies provide evidence that
    most MWUs undergo lexical and syntactic variation (although
    restricted to some extent), and hence must have internal
    structure.
    This is especially important when working with MWUs that are
    comprised of a verb and a noun. Such MWUs vary a lot in
    terms of their degree of compositionality (or better said,
    their degree of semantic analyzability) and hence their
    degree of lexicosyntactic fixedness.
    Many such MWUs (e.g., "kick the bucket", "shoot the breeze")
    are to a large extent idiomatic (unanalyzable). Others
    have meanings with metaphorical relations to the literal
    meanings of the constituents, and hence are considered more
    analyzable, e.g., "pull strings", "push one's luck", etc.

    Another very interesting class of such MWUs (with internal
    structure) are those often categorized as light verb
    constructions (LVCs). Examples are "give a groan",
    "make an offer", "take a walk", etc. These are considered
    semi-compositional, somewhat analyzable, and more lexically
    and syntactically flexible than pure idioms.
    In fact, one motivation behind using such complex predicates
    is argued to be that their internal structure increases their
    expressive power, e.g., one can "give a sad groan",
    "make an appealing offer", or "take a long walk".

    On the other hand, considering such MWUs as units with internal
    structure poses another problem, and that is how they are to
    be distinguished from similar-on-the-surface combinations.
    One reason for making such a distinction is of course their
    semantic idiosyncrasy (the interpretation of "shoot the breeze"
    is very much different from that of "shoot the bird").
    Another reason is that compared to compositional combinations,
    MWUs are overall more constrained in terms of lexical and
    syntactic variations they undergo, and this information should
    be included in their lexical representation.

    We have done some work on verb--noun MWUs which might be of
    interest to you.
    We develop statistical models that draw on such linguistic
    characteristics to predict whether a given combination is
    idiomatic or metaphorical in the case of LVCs.
    We use evidence from lexicogrammatical fixedness of these
    MWUs for the purpose. (Some related publications could be
    found here: www.cs.toronto.edu/~afsaneh/publications.html).

    Regards,

    Afsaneh Fazly
    =============================================================
    PhD student, Computational Linguistics Group
    University of Toronto
    www.cs.toronto.edu/~afsaneh
    =============================================================

    On Thu, 16 Mar 2006, David Brooks wrote:

    > Chris Butler wrote:
    > > I notice that recent postings on this topic are concerned largely with the
    > > matter of opacity of meaning in MWEs - Robert Amsler's working principle "if
    > > you can predict its meaning from its constituent parts, it
    > > doesn't need a separate entry" effectively equates MWE with the traditional
    > > idiom.
    >
    > Yes, and I fear this is my fault. I realise there is some difference of
    > opinion on many of the matters discussed so far, and I perhaps should
    > have narrowed this topic down to something more manageable.
    >
    > My interest being in syntax, I'm interested in the implications of MWE
    > for evaluating parsers. That is to say, if you get something like "light
    > pen" in a corpus, it may be tagged as an N-bar, with either a compound
    > <N N> or an <Adj N>, but in principle the *syntax* will remain the same
    > (tag differences aside).
    >
    > I would imagine this is not the case for "of course", which doesn't
    > strike me as a natural prepositional-phrase; likewise "kick the bucket"
    > is /syntactically/ a transitive verb-phrase, but, and here is the core
    > of my original (underspecified) question, would it be tagged as a
    > transitive verb-phrase, or would it be tagged as an MWE - perhaps an
    > intransitive verb-like MWE?
    >
    > The reason I ask is that for things like PARSEVAL, this is going to have
    > an impact on constituent bracket scores, and I was wondering to what
    > extent it had been investigated, and how noticeable the effect of MWEs
    > might be.
    >
    > So, I guess I'm principally interested in MWEs that cause a syntactic
    > variation (from the compositional norm), and whether or not they are
    > tagged in treebanks. Still it's been quite an enlightening debate...
    >
    > D
    > --
    > David Brooks
    > http://www.cs.bham.ac.uk/~djb
    >
    >



    This archive was generated by hypermail 2b29 : Fri Mar 17 2006 - 15:40:07 MET