RE: [Corpora-List] Author+'s plans for books

From: Copperman, Max (Max.Copperman@knova.com)
Date: Wed Mar 15 2006 - 19:12:36 MET

  • Next message: Ziwei Huang: "[Corpora-List] A problem in WordSmith Tools 4"

    Discourse structure theory may be an appropriate tool for this job.
    However, Rhetorical Structure Theory is unlikely to be the discourse
    structure theory that helps. It's rather ad hoc (and I'm being
    charitable here). I'd look at work by Livia Polanyi and work on
    Discourse Representation Theory. Someone actually familiar with the
    field could probably make stronger recommendations.
     
    Max Copperman

    ________________________________

    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Alexander Schutz
    Sent: Wednesday, March 15, 2006 9:30 AM
    To: D.G.Damle
    Cc: CORPORA
    Subject: Re: [Corpora-List] Author+'s plans for books

            I am trying to learn ontologies from text. Evaluation is a
    problem, since if you ask people to read the text and then to evaluate
    the automatically generated ontology; every reader's concept structure
    may be different. The variation amongst readers may be too great!

    In my opinion, it will be extremely helpful to restrict the amount of
    concepts (or the choice of concepts in general). It is not so obvious
    what you are trying to achieve:
    Evaluating the learned concepts of a system against a gold standard?
    Then, on which kind of corpus did you conduct your experiments? I assume
    it is a domain specific corpus (of textbooks). In that case it would be
    quite easy to agree on a subset of certain concepts for that domain, and
    restrict the domain experts (readers) to refer only to elements of this
    subset while evaluating your system.

            It is also difficult to have such an ontology marked by domain
    experts. What the domain experts know about the domain may not be
    reflected in the text and so Rrecall is particularly difficult. Also,
    evaluators may not be willing to read large texts.

    Evaluation in ontology learning is a pain in the neck, and your problem
    with precision will by far outweigh your recall problem. Just imagine
    that your goal is to *learn* ontology concepts (or relations). What if
    your system is learning something new (i.e. which is not contained in
    the gold standard, or in your subset of concepts agreed upon?). It will
    then contribute to your precision error.
    On the other hand, if you decide to compose your gold standard of all
    the possible concepts in the whole world (just to make sure your system
    will not run into precision problems described above), there will be
    loads of concepts that you miss, because they are not contained in the
    text (which accounts for the recall problem you described). Yes,
    evaluation of ontology learning, it is a dilemma.

    The fact that evaluators may not be willing to read large texts is in my
    opinion not a problem of ontology learning and there is a lot you can do
    to assure the loyalty of your evaluators (hint hint)

            Does the ontology defined by the author(s) of a large text
    constitute a more objective yardstick? Do authors have a list of
    concepts and possibly some notion of structure about the text they set
    out to create? (I am thinking particularly of textbooks). Do any
    authors commit something like a concept structure to paper or a computer
    documentbefore they write the text? Alternatively, is it likely that an
    author could retrospectively construct such a plan, notwithstanding the
    issues of memory lapses etc.

    To be honest I have not written any textbook but I would like to think
    that before I write a larger chunk of text (say a paper), I have a
    certain structure (and the containing concepts so to speak) in mind
    before I actually start writing.

            Do any authors have such plans and the texts they wrote using
    those plans in an electronic form which they would be happy to make
    available for research? What do list members who write textbooks, do?

    If you speak of text planning, then maybe discourse and text theory is
    the right thing for you, such as Rhetorical Structure Theory

    @Article{thompson-mann87,
       Author="Thompson, Sandra A. and Mann, William C."
       Title="Rhetorical Structure Theory: A framework for the analysis of
    texts",
         Journal="IPrA Papers in Pragmatics",

       Volume=1,
       Number=1,
       Pages="79-105",
         Abstract="One of the foundation papers of RST."
       Year=1987}

    -- 
    Alexander Schutz
    Student of Computational Linguistics
    University of Saarland, Germany 
    



    This archive was generated by hypermail 2b29 : Wed Mar 15 2006 - 19:24:07 MET