Saarland University
Department of Computational Linguistics and Phonetics
NEGRA Corpus
A Syntactically Annotated Corpus
of German Newspaper Texts
http://www.coli.uni-sb.de/sfb378/negra-corpus/
The corpus is available free of charge to all universities and other
non-profit research organizations. Others please contact us for conditions.
---------------------------------------------------------------------------
The NEGRA corpus consists of approx. 10,000 sentences (176,000 tokens)
of German newspaper text taken from the Frankfurter Rundschau. It is
annotated for
- Part-of-Speech with the Stuttgart-Tuebingen-Tagset (STTS)
- Morphology (first 60,000 tokens), and
- syntactic structures, which are context-free trees that additionally
allow crossing branches in order to mark predicate-argument relations.
Each sentence was semi-automatically annotated by two human annotators
and subsequently compared.
For details, please visit
http://www.coli.uni-sb.de/sfb378/negra-corpus/
The research was funded by:
- Universitaet des Saarlandes, Saarbruecken
- Deutsche Forschungsgemeinschaft