[Corpora-List] LREC post-conference workshop "XML-based richly annotated corpora"

From: Andreas Witt (andreas.witt@uni-bielefeld.de)
Date: Wed Dec 17 2003 - 21:39:00 MET

  • Next message: Declerck: "[Corpora-List] ESSLLI 2004 Workshop CfP: NLP for Multimedia Applications"

    ____________________________________________________________________
                  This message is posted to several lists.
                We apologize if you receive multiple copies.
           Please forward it to everyone who might be interested.
    _____________________________________________________________________

    **********************************************
       FIRST ANNOUNCEMENT AND CALL FOR PAPERS

                   Workshop on

          XML-based richly annotated corpora

       http://coli.lili.uni-bielefeld.de/forschung/xbrac/

       LREC post-conference workshop
       Centro Cultural de Belem, LISBON, Portugal, 29th May 2004

      **********************************************

    Call for Papers

    XML has become a de facto standard for the representation of corpus
    resources. It is being used for representing speech and text
    corpora, multimodal and multimedial corpora, as well as, in
    particular, integrated corpora which combine different
    modalities. XML-based representations make it easier to work with
    richly annotated corpora, which include annotations from different
    levels of linguistic description or from different modalities. A
    number of tools have also become available, over the last few years,
    for creating, managing, annotating, querying such corpora and for
    their statistical exploration.

    Although XML is a useful representation language, its use alone does
    not solve all the problems and choices with respect to the
    representation style (e.g. stand-off annotations vs. embedded
    annotations); these are in turn closely linked with questions of the
    architecture of richly annotated corpora, such as the following:
    should information from different levels of linguistic description
    be represented in separate "layers" of the annotation? Should a
    given information type serve as a grounding for all or some of the
    others? How to account for interdependencies and interaction between
    phenomena from different levels of description? How to account for
    concurrent annotation (one phenomenon, different analyses or
    theories/approaches)?

    Such questions and the pertaining corpus-architectural
    considerations interact with at least two more problem areas: on the
    one hand with the kinds of research questions and of phenomena to be
    analysed in linguistic and natural interaction research (which may
    call for certain architectural solutions), and on the other hand
    with tools for the creation, annotation, manipulation and
    exploration of XML-based corpora.

    The workshop will attempt to address the interplay between the
    following research areas:

        1. XML techniques for corpus representation, i.e. :

               * Standoff annotation vs. embedded annotation;
               * Use of XML linking standards for language data (XLink,
                 XPointer, XPath); other ways of ensuring relationships
                 between levels, e.g. through naming conventions;
               * Concepts of layering in corpora annotated at several
                 levels of linguistic description; types of information
                 grouped together vs. distributed over different
                 "packages"
               * Hierarchical vs. flat annotation;
               * the grounding of annotations (e.g. in XML elements vs.
                 in characters?) and its implications;
               * techniques for the manipulation of XML-based
                 representations for massively annotated corpora;
                 usefulness and relevance of XQuery.
        2. Levels of linguistic description and their interaction, i.e.:

               * Examples of richly annotated corpora: reasons for the
                 choice of the annotated levels; linguistic and natural
                 interactivity research questions which can (only) be
                 solved with richly annotated data;
               * Interaction between levels: new research questions in
                 linguistics and natural interactivity research which
                 can only be addressed because of observation across
                 levels, across modalities, etc. An example is the use
                 of clustering techniques across different levels: e.g.
                 relevant cooccurrences of phenomena from different
                 levels identified via clustering;
               * Use and usefulness of concurrent annotations in
                 XML-based corpora; an example is concurrent flat and
                 deep syntactic analysis.

        3. Tools for handling richly annotated corpora: Software
           solutions for, e.g.,

               * corpus creation, transformation, exchange, and
                 validation
               * interactive annotation;
               * exploration: query and retrieval, statistical analysis;
               * corpus management (e.g. wrt. meta-data).

           Tools presented should be positioned with respect to the
           questions of corpus architecture and with respect to the
           research directions discussed above under (1) and (2).

    The workshop aims at bringing together XML experts, both theorists
    and practitioners, as well as linguists and natural interactivity
    researchers working on the definition of corpus architectures,
    annotation and resource exchange schemes and on tools for the use of
    multilevel and/or multi-layer annotated corpora. It will provide a
    forum for the definition of requirements for corpus representations
    and pertaining tools, discussing at the same time case studies from
    linguistics and natural interactivity research.

    Organisers

         * Andreas Witt, Bielefeld University
         * Ulrich Heid, University of Stuttgart
         * Henry S. Thompson, University of Edinburgh
         * Jean Carletta, University of Edinburgh
         * Peter Wittenburg, MPI for Psycholinguistics Nijmegen

    Program committee

         * Jean Carletta, University of Edinburgh, UK
         * Ulrich Heid, University of Stuttgart, Germany
         * Henning Lobin, Justus-Liebig-Universität Gießen, Germany
         * Dieter Metzing, Bielefeld University, Germany
         * Joakim Nivre, Växjö University, Sweden
         * Vito Pirrelli, Istituto di Linguistica Computazionale
           del CNR, Pisa, Italy
         * Gary Simons, SIL International, Taxas, USA
         * Henry S. Thompson, University of Edinburgh, UK
         * Jun'ichi Tsujii, University of Tokyo, Japan
         * Andreas Witt, Bielefeld University, Germany
         * Peter Wittenburg, MPI for Psycholinguistics Nijmegen,
           Netherlands

    Submissions
    Authors are invited to submit papers for oral presentation in any of
    the areas listed above. Only full papers will be accepted, and the
    length of the paper should not exceed 8 pages.

    Requirements for Paper Submission:

         * Submissions must be full papers, not extended abstracts.
         * It is highly recommendedauthors submit papers in the LREC-conference
           proceedings format (maximum of 8 pages).
         * Submission in other formats will be accepted (font sizes of 11 or 12
           point), however they can be no longer than eight (8) pages including
           figures, tables, and references, formatted for A4-paper with reasonable
           margins.
         * Electronic submission of manuscripts (details in the submission site) is
           required (PDF preferred, Postscript, and ASCII accepted).
         * An additional title page should include the title, author(s),
           affiliation(s), contact email address, postal address, telephone, fax and
           URL as well as five keywords.

    Submission should be sent by email, to andreas.witt@uni-bielefeld.de before
    15th February 2004.



    This archive was generated by hypermail 2b29 : Sun Dec 28 2003 - 22:15:55 MET