Corpora: Announcement: German Newspaper Corpus

Thorsten Brants (thorsten@CoLi.Uni-SB.DE)
Thu, 3 Dec 1998 00:13:29 +0100 (MET)

---------------------------------------------------------------------------

Saarland University
Department of Computational Linguistics and Phonetics

NEGRA Corpus

A Syntactically Annotated Corpus
of German Newspaper Texts

http://www.coli.uni-sb.de/sfb378/negra-corpus/

The corpus is available free of charge to all universities and other
non-profit research organizations. Others please contact us for conditions.

---------------------------------------------------------------------------

The NEGRA corpus consists of approx. 10,000 sentences (176,000 tokens)
of German newspaper text taken from the Frankfurter Rundschau. It is
annotated for
- Part-of-Speech with the Stuttgart-Tuebingen-Tagset (STTS)
- Morphology (first 60,000 tokens), and
- syntactic structures, which are context-free trees that additionally
allow crossing branches in order to mark predicate-argument relations.
Each sentence was semi-automatically annotated by two human annotators
and subsequently compared.

For details, please visit
http://www.coli.uni-sb.de/sfb378/negra-corpus/

The research was funded by:
- Universitaet des Saarlandes, Saarbruecken
- Deutsche Forschungsgemeinschaft