[Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian

From: Hrafn Loftsson (HRAFN@ru.is)
Date: Thu Feb 15 2007 - 14:19:57 MET

Next message: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"

Previous message: Costa Luis F: "[Corpora-List] Job announcement: Researchers in Portuguese NLP"
Next in thread: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"
Reply: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi everyone,

(It has been pointed out to me that, for some reason, my message to the
list appeared empty in some e-mail systems. Here is a second try:)

The paper: "J. Hajic (2000) Morphological tagging: Data vs.
Dictionaries", reports percentages of ambiguous tokens for English
(38.65%), Czech (45.97%), Estonian (40.24%), Hungarian (21.58%),
Romanian (40.00%) and Slovene (38.01%), using an annotated version of
Orwell's 1984 novel for each of these languages.

I need corresponding percentage number for Swedish, Danish and
Norwegian, calculated using ANY corpora.

Does anyone have this info (and preferably a reference to a paper which
discusses the issue)?

Regards,

Hrafn Loftsson

Assistant professor

Department of Computer Science

School of Science and Engineering

Reykjavik University

Iceland

Next message: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"
Previous message: Costa Luis F: "[Corpora-List] Job announcement: Researchers in Portuguese NLP"
Next in thread: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"
Reply: Joakim Nivre: "Re: [Corpora-List] Ratio of ambiguous tokens in Swedish, Danish and Norwegian"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Feb 15 2007 - 14:17:49 MET