[Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian

From: Hrafn Loftsson (HRAFN@ru.is)
Date: Thu Feb 15 2007 - 14:19:57 MET

  • Next message: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"

    Hi everyone,

     

    (It has been pointed out to me that, for some reason, my message to the
    list appeared empty in some e-mail systems. Here is a second try:)

     

    The paper: "J. Hajic (2000) Morphological tagging: Data vs.
    Dictionaries", reports percentages of ambiguous tokens for English
    (38.65%), Czech (45.97%), Estonian (40.24%), Hungarian (21.58%),
    Romanian (40.00%) and Slovene (38.01%), using an annotated version of
    Orwell's 1984 novel for each of these languages.

     

    I need corresponding percentage number for Swedish, Danish and
    Norwegian, calculated using ANY corpora.

     

    Does anyone have this info (and preferably a reference to a paper which
    discusses the issue)?

     

    Regards,

    Hrafn Loftsson

    Assistant professor

    Department of Computer Science

    School of Science and Engineering

    Reykjavik University

    Iceland



    This archive was generated by hypermail 2b29 : Thu Feb 15 2007 - 14:17:49 MET