Re: [Corpora-List] Author+'s plans for books

From: Alexander Schutz (goalscoringsuperstarhero@gmail.com)
Date: Wed Mar 15 2006 - 18:29:38 MET

  • Next message: Ken Litkowski: "[Corpora-List] CFP: Special CL issue on Semantic Role Labeling"

    >
    > I am trying to learn ontologies from text. Evaluation is a problem, since
    > if you ask people to read the text and then to evaluate the automatically
    > generated ontology; every reader's concept structure may be different. The
    > variation amongst readers may be too great!
    >
    In my opinion, it will be extremely helpful to restrict the amount of
    concepts (or the choice of concepts in general). It is not so obvious what
    you are trying to achieve:
    Evaluating the learned concepts of a system against a gold standard? Then,
    on which kind of corpus did you conduct your experiments? I assume it is a
    domain specific corpus (of textbooks). In that case it would be quite easy
    to agree on a subset of certain concepts for that domain, and restrict the
    domain experts (readers) to refer only to elements of this subset while
    evaluating your system.

    > It is also difficult to have such an ontology marked by domain experts.
    > What the domain experts know about the domain may not be reflected in the
    > text and so Rrecall is particularly difficult. Also, evaluators may not be
    > willing to read large texts.
    >
    Evaluation in ontology learning is a pain in the neck, and your problem with
    precision will by far outweigh your recall problem. Just imagine that your
    goal is to *learn* ontology concepts (or relations). What if your system is
    learning something new (i.e. which is not contained in the gold standard, or
    in your subset of concepts agreed upon?). It will then contribute to your
    precision error.
    On the other hand, if you decide to compose your gold standard of all the
    possible concepts in the whole world (just to make sure your system will not
    run into precision problems described above), there will be loads of
    concepts that you miss, because they are not contained in the text (which
    accounts for the recall problem you described). Yes, evaluation of ontology
    learning, it is a dilemma.

    The fact that evaluators may not be willing to read large texts is in my
    opinion not a problem of ontology learning and there is a lot you can do to
    assure the loyalty of your evaluators (hint hint)

    > Does the ontology defined by the author(s) of a large text constitute a
    > more objective yardstick? Do authors have a list of concepts and possibly
    > some notion of structure about the text they set out to create? (I am
    > thinking particularly of textbooks). Do any authors commit something like a
    > concept structure to paper or a computer documentbefore they write the
    > text? Alternatively, is it likely that an author could retrospectively
    > construct such a plan, notwithstanding the issues of memory lapses etc.
    >
    To be honest I have not written any textbook but I would like to think that
    before I write a larger chunk of text (say a paper), I have a certain
    structure (and the containing concepts so to speak) in mind before I
    actually start writing.

    > Do any authors have such plans and the texts they wrote using those plans
    > in an electronic form which they would be happy to make available for
    > research? What do list members who write textbooks, do?
    >
    If you speak of text planning, then maybe discourse and text theory is the
    right thing for you, such as Rhetorical Structure Theory

    @Article{thompson-mann87,
       Author="Thompson, Sandra A. and Mann, William C."
       Title="Rhetorical Structure Theory: A framework for the analysis of texts",
         Journal="IPrA Papers in Pragmatics",
       Volume=1,
       Number=1,
       Pages="79-105",
         Abstract="One of the foundation papers of RST."
       Year=1987}

    --
    Alexander Schutz
    Student of Computational Linguistics
    University of Saarland, Germany
    



    This archive was generated by hypermail 2b29 : Wed Mar 15 2006 - 18:28:51 MET