[Corpora-List] Referring expressions: familiarity/accessibility

From: Klebanov Beata (beata@cs.huji.ac.il)
Date: Mon Sep 02 2002 - 16:23:02 MET DST

  • Next message: P bI K O B_ B.B.: "Re: [Corpora-List] [osander@gmx.de: How to extract N-grams]"

    Dear all,

    As far as I know, the classification of referring expressions
    according to the assumed familiarity/accessibility of the entity
    being referred to usually looks smth like:
    pronouns > demonstratives(+NP) > partial names > short DEFs > long
    DEFs > full names > short INDEFS > long INDEFS.

    However, below are some cases I came across where the expression
    is an RE, but it is not quite clear to me where it fits on the scale (all
    examples are from the Wall Street Journal):

    (1) comparatives:
        weaker results (Digital Equipment's profit fell 32% in the latest
                        quarter, prompting forecasts of weaker results ahead.)
        higher commissions and revenue (The company said the improved
                                        performance from a year ago reflects
                                        higher commissions and revenue from
                                        marketing ....)

            => These assume that some benchmark results/revenue
               were mentioned before (the 32% fall; those one year ago),
               although entities referred to with the expressions themselves
               are new.
               It seems to me that "weaker results" has a higher degree
               of familiarity than "weak results", but just how much higher?
               The anchoring in previously mentioned entity reminds me of
               bridging, which is usually associated with short DEFs.

    (2) quantifiers:
        another round of horror
        any other major currency
            => seem to me somewhat similar to (1)

    (3) things that are (possibly) assumed to be singular entities:

        genocide (the reports of genocide taking place...)
        gold (In the Commodity Exchange in New York, gold dropped $1.60
              to...; The dollar finished mixed, while gold declined.)
        literature (The Nobel prize in literature)

            => I think these are all REs, since they can be referred to later:
                 the killing ... (genocide); it regained ... (gold), this
               category is considered the most competitive ... (literature).
               One possibility is to treat them as names - genocide
               standing for "the phenomenon of violence on ethnic basis",
               literature being "category of competition where writings
               of fiction by contemporary authors are presented", etc.
               Another one is
                 to treat them as shortDEFs, as if every mention was a mention
               of the singular, one only entity (akin to "the sun"), where
               possibly not all of its aspects are relevant ("literature" in
               the example does not include "The Iliad", or articles on
               Computational Linguistics).

            

    I will appreciate any pointers to relevant literature (!), and/or
    comments on the examples. Would you know of any attempts to do automatic
    classification of REs?

    Thank you,

    Beata Klebanov
    ==============
    PhD student, Computer Science Department
    The Hebrew University of Jerusalem, Israel
    email: beata@cs.huji.ac.il
    www: http://www.cs.huji.ac.il/~beata
    phone (office): 972 - 2 - 6585386



    This archive was generated by hypermail 2b29 : Mon Sep 02 2002 - 20:25:45 MET DST