[Corpora-List] The Tuebingen Treebank of Written German (TueBa-D/Z) - third release

From: Yannick Versley (versley@sfs.uni-tuebingen.de)
Date: Fri Jul 14 2006 - 14:14:18 MET DST

  • Next message: Giuseppe Riccardi: "[Corpora-List] Research Scientist Positions (User Interface and Spoken/Multimodal dialog)"

    The Division of Computational Linguistics at the Seminar fuer
    Sprachwissenschaft
    of the University of Tuebingen (Germany) is happy to announce the
    release a referentially and syntactically annotated German corpus:

    * The Tuebingen Treebank of Written German (TueBa-D/Z) - third release

    The TueBa-D/Z treebank is a manually annotated German newspaper
    corpus based on data taken from the daily issues of the 'die tageszeitung'.
    It currently comprises approximately 27 000 sentences (ca. 470 000 words).

    The syntactic annotation scheme of the TueBa-D/Z distinguishes four levels
    of syntactic constituency: the lexical level, the phrasal level,
    the level of topological fields, and the clausal level.
    In addition to constituent structure, annotated trees contain edge labels
    between nodes which encode grammatical functions.
    Words are annotated with inflectional morphology at the lexical level
    (currently ca. 80% of the sentences are covered).

    The treebank is available in 3 different formats:
       * NEGRA export format
       * XML format
       * Penn Treebank format

    Currently, about 23 500 sentences of the treebank (about 1 100 articles) have
    been enriched with anaphoric and coreference relations referring to nominal
    and pronominal antecedents.
    Linking relations include: coreferential (two NPs refer to the same
    extralinguistic referent), anaphoric/cataphoric (a definite pronoun refers to
    a contextual antecedent) and other relations (split-antecedent, instance) as
    well as marking of expletive pronouns.

    The referentially annotation is available in a stand-alone version, which is
    in the PALinkA format, or with a unified representation of syntactic and
    referential information, in the NEGRA Export and XML formats.

    What is new in the third release:

    - about 5 000 additional sentences
    - referential annotation
    - cleaner versions of the trees published in the first/second release

    The license for TueBa-D/Z is granted free of charge for scientific use.
    For more information, please refer to:
    http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

    With best regards,

    Erhard W. Hinrichs
    Sandra Kübler
    Heike Zinsmeister
    Karin Naumann
    Holger Wunsch
    Yannick Versley

    -- 
    Yannick Versley
    Seminar für Sprachwissenschaft, Abt. Computerlinguistik
    Wilhelmstr. 19, 72074 Tübingen (Germany)
    Tel.: +49 - 7071 29 77352
    



    This archive was generated by hypermail 2b29 : Fri Jul 14 2006 - 14:45:56 MET DST