[Corpora-List] Tagset mapping (Negra -> Penn Treebank)

From: Kevin Duh (duh@ee.washington.edu)
Date: Tue Dec 27 2005 - 19:30:40 MET

  • Next message: Nigel Bruce: "[Corpora-List] Matching software"

    Dear Corpora members,

    I am interested in comparing the part-of-speech tag distributions of
    English vs. German. Currently, I'm looking into using the WSJ Penn
    Treebank for English, and the Negra and TIGER corpora for German.
    However, the tagsets of WSJ vs. Negra/TIGER are different, so I'm
    wondering if anyone has any mapping that converts from the Negra/TIGER
    tagset to the WSJ tagset?

    In other words, is there some document that specifies, to the best
    effort possible, which tags in the Negra tagset
    (http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/stts.asc)
    corresponds to which tags in the WSJ tagset
    (http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html).

    Thanks in advance!
    Kevin Duh

    -------------------------------------------------
    Kevin Duh
    Dept. of Electrical Engineering
    University of Washington
    http://ssli.ee.washington.edu/people/duh/



    This archive was generated by hypermail 2b29 : Tue Dec 27 2005 - 20:13:30 MET