[Corpora-List] Talbanken05 (Swedish Treebank)

From: Joakim Nivre (nivre@msi.vxu.se)
Date: Wed Nov 23 2005 - 08:54:28 MET

  • Next message: Hamish Cunningham: "[Corpora-List] GATE-related postdoc opportunity in Innsbruck"

    We are happy to announce the release of Talbanken05 (Version 1.0).

    Talbanken05 is a modernized version of Talbanken76, a Swedish treebank of
    roughly 300,000 words, constructed at Lund University in the 1970s.
    The treebank comes with no guarantee but is freely available for research
    and educational purposes as long as proper credit is given for the work
    done to produce the material (both in Lund and in Växjö).

    The treebank consists of a written language part and a spoken language
    part of roughly equal size. The written language part in turn consists of
    two sections, the so-called professional prose section (P), with data
    from textbooks, brochures, newspapers, etc., and a collection of high
    school students' essays (G). The spoken language part also has two
    sections, interviews (IB) and conversations and debates (SD). Altogether,
    the corpus contains close to 300,000 running tokens.

    The distribution contains the entire treebank (divided into sections
    P, G, IB and SD) in four versions:

      MAMBA: Original syntactic and lexical annotation (and encoding)
      FPS: Flat phrase structure annotation (TIGER-XML encoding)
      DPS: Deepened phrase structure annotation (TIGER-XML encoding)
      Dep: Dependency structure annotation (Malt-XML encoding)

    The treebank can be downloaded from:
    http://www.msi.vxu.se/users/nivre/research/Talbanken05.html

    Joakim Nivre
    Jens Nilsson
    Johan Hall

    Växjö University
    School of Mathematics and Systems Engineering



    This archive was generated by hypermail 2b29 : Wed Nov 23 2005 - 09:36:33 MET