[Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian

From: Hrafn Loftsson (HRAFN@ru.is)
Date: Wed Feb 14 2007 - 17:44:45 MET

  • Next message: Costa Luis F: "[Corpora-List] Job announcement: Researchers in Portuguese NLP"

    Hi everyone,

    =20

    The paper: "J. Hajic (2000) Morphological tagging: Data vs.
    Dictionaries", reports percentages of ambiguous tokens for English
    (38.65%), Czech (45.97%), Estonian (40.24%), Hungarian (21.58%),
    Romanian (40.00%) and Slovene (38.01%), using an annotated version of
    Orwell's 1984 novel for each of these languages. =20

    =20

    I need corresponding percentage number for Swedish, Danish and
    Norwegian, calculated using ANY corpora. =20

    =20

    Does anyone have this info (and preferably a reference to a paper which
    discusses the issue)?

    =20

    Regards,
    Hrafn Loftsson
    Assistant professor

    Department of Computer Science
    School of Science and Engineering
    Reykjavik University
    Iceland



    This archive was generated by hypermail 2b29 : Thu Feb 15 2007 - 09:36:17 MET