Re: [Corpora-List] dictionary definitions to glosses

From: Ken Litkowski (ken@clres.com)
Date: Tue Dec 09 2003 - 20:58:55 MET

  • Next message: Chin-Yew Lin: "[Corpora-List] Final CFP - MSQA-Eval 2004 to held with IJCNLP-04"

    Mike Maxwell wrote:

    > Has anyone seen any work on reducing dictionary-style definitions to
    > simple(r) glosses?
    >

    Even with the caveat noted in your followup message about working with
    bilingual lexicons, the very idea raises my hackles. Dictionary
    definitions should be well crafted, so that even we computationalists
    can be quite precise in exploiting their meanings. WordNet has
    frequently been criticized for its glosses (initially developed, justly
    so, only as reminders for its developers), and WN 2.0 has made
    substantial changes. As for bilingual lexicons, it surely has been the
    case that the paper format places severe space constraints on what can
    be included, so that they are frequently just as cryptic and difficult
    to use for "meaning-full" use; the electronic format may help in this
    regard, allowing for a fuller understanding. (Also, there are the
    learners' dictionaries, which show how important it can be to provide
    fuller understanding.)

    The hackles now having been lowered, perhaps the main issue is how you
    expect to use gisted definitions. It would seem that a more descriptive
    context would help define the task better to meet your needs.

    > For example, the definition
    >
    > act or process of shrinking, esp in wood; shrinkage.
    >
    > might reduce to 'shrinkage', and
    >
    > bother; disturbance or interruption.
    >
    > might similarly reduce to any one of the three content words. In some
    > cases, more than one word might be output:
    >
    > to carry a canoe
    >
    > should probably reduce to 'carry canoe', not just 'carry' or 'canoe'.
    >

    What are the words being defined here? Those words themselves are the
    "gist" of the definitions, probably better than any other choices you
    might make. So, again, what are you trying to accomplish.

    > I can think of some heuristics, e.g. choose the least common word (in some
    > sense of 'common'), but if the chosen word is the object of a verb, retain
    > the verb also. (Which requires some parsing--fortunately, verbs in English
    > definitions are usually preceded by the word 'to', I suspect, so
    > distinguishing verbs from nouns should not be all that difficult.)
    >
    > I suppose this may be related to text summarization work.
    >

    You are correct in suggesting the use of heuristics. A Perl script for
    this purpose can be readily developed. Of course, it requires that you
    closely examine what you want to purge. A colleague lexicographer
    developed such a script (only a few hundred lines) stripping down full
    blown definitions for one of the best dictionaries on the market in
    order to assess "similarity" between definitions for fitting word senses
    underneath the tops of WordNet, distinguishing possible hypernyms and
    the remaining "content-full" words. Just a matter of slogging through it.

    As to your summarization analogy, the best team in DUC 2003 for headline
    generation (less than 10 words) followed a similar approach to Radev et
    al., via a process of removing "less important fragments" from sentences
    viewed as most expressing the content of newswire texts. This did
    involve working with full parses of the sentences. I would suggest
    looking at the DUC 2003 papers
    (http://www-nlpir.nist.gov/projects/duc/pubs.html).

            Ken

    -- 
    Ken Litkowski                     TEL.: 301-482-0237
    CL Research                       EMAIL: ken@clres.com
    9208 Gue Road
    Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com
    



    This archive was generated by hypermail 2b29 : Tue Dec 09 2003 - 21:01:25 MET